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(54)TitIe: GENERATION AND SELECTION OF RECOMBINANT VARIED BINDING PROTEINS 



(57) Abstract 

In order to obtain a novel binding protein against a chosen target, DNA molecules, each encoding a protein comprising 
one of a family of similar potential binding domains and a structural signal calling for the display of the protein on the outer sur- 
face of a chosen bacterial cell, bacterial spore or phage (genetic package) are introduced into a genetic package. The protein is ex- 
pressed and the potential binding domain is displayed on the outer surface of the package. The cells or viruses bearing the bind- 
ing domains which recognize the target molecule are isolated and amplified. The successful binding domains are then character- 
ized. One or more of these successful binding domains is used as a model for the design of a new family of potential binding 
domains, and the process is repeated until a novel binding domain having a desired affinity for the target molecule is obtained. In 
one embodiment, the first family of potential binding domains is related to bovine pancreatic trypsin inhibitor, the genetic pack- 
age is M13 phage, and the protein includes the outer surface transport signal of the M13 gene III protein. 
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GENERATION AND SELECTION OF RECOMBINANT 
VARIEGATED BINDING PROTEINS 



5 THeld of the Invention 

This invention relates to development of novel 
binding proteins by an iterative process of 
mutagenesis, expression, chromatographic selection, and 
amplification. 

10 

Information Disclosure Statement 

The amino acid sequence of a protein determines 
its three-dimensional (3D) structure, which in turn 
15 determines protein functioning rTCPST63. ANFI731 . The 
system of classification of protein structure of Schulz 
and Schirmer f SCHU79 . ch 5) is adopted herein. 

The 3D structure of a protein is essentially 
20 unaffected by the identity' of the amino acids at some 
loci; at other loci only one or a few types of amino 
acid is allowed r sHORSS. ktskss. REID88) . Generally, 
loci where wide variety is allowed have the amino acid 
side group directed toward the solvent. While limited 
25 variety is allowed where the side group is directed 
toward other parts of the protein. (See also SCHU79, 
pl69-171 and CREI84 . p239-245, 314-315). 

The secondary structure (helices, sheets, turns, 
30 loops) of a protein is determined mostly by local 
sequence. Certain amino acids tend to be correlated 
with certain secondary structures and the commonly used 
Chou-Fasman ( CHOU74, CHOti78a. CHOU78b ) rules depend on 
these correlations. However, every amino acid type has 
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been observed in helices and in both parallel and 
antiparallel sheets. Pentapeptides of identical 
sequence are found in different proteins; in some cases 
the conformations of the pentapeptides are very 
5 different f KABS84, ARGQ87 1 • 

Turns and loops tolerate insertions and deletions 
more readily than do other secondary structures 
(RICH81, THOR88, SUTC87a) ; related proteins differ most 
10 in loops and turns. 

Changing three residues in subtilisin from 
Bacillus amYloliquefaciens to be the same as the 
corresponding residues in subtilisin from 

15 licheniformis produced a protease that had nearly the 
same activity as the subtilisin from the latter 
organism; 82 differences remained in the sequences. 
The three residues changed were chosen because they 
were the only differences within 7 Angstroms (A) of the 

20 active site (WELL87a) . 

Schulz and Schirmer summarize many observations on 
the binding of proteins to other molecules (SCHU79, 
p98-105) . For example, haemoglobin alpha chains bind 

25 very tightly to haemoglobin beta chains (delta G more 
negative than -11.0 Kcal/mole) ; antibodies bind tightly 
to antigens (K^s range from 10" 6 . to 10 ~ 14 M, is the 
dissociation constant equal to [A] [B]/[A:B] ) ; basic 
bovine pancreatic trypsin inhibitor (BPTI) binds 

30 tightly to trypsin « 6.0 x 10~ 14 M (2SCH87) , delta 
G - -18.0 Kcal/mole); and avidin binds to biotin (K d = 
1.3 x 10" 15 M (CREI84, p362)). In each case the 
binding results from complementarity of the surfaces 
that come into contact: bumps fit into holes, unlike 

35 charges come together, dipoles align, and hydrophobic 
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atoms contact other hydrophobic atoms. Although bulk 
water is excluded, individual water molecules are 
frequently found filling space in intermolecular 
interfaces; these waters usually form hydrogen bonds to 
5 one or more atoms of the protein or to other bound 
water. 

The factors affecting protein binding are known, 
. f CHOT75. CHOT76. SCHU79 . p98-107, and CREI84 , Ch8) , but 

10 designing new complementary surfaces has proved 
difficult. Although some rules have been developed for 
substituting side groups (SUTC87b) , the side groups of 
proteins are floppy and it is difficult to predict what 
conformation a new side group will take. Further, the 
15 forces that bind proteins to other molecules are all 
relatively weak and it is difficult to predict the 
effects of these forces. Hence, it is difficult to 
design superior binding proteins based on theory alone 
(QUI087) . 

20 

Enzyme-substrate affinity, however, has 
fortuitously been increased by protein engineering 
(WILK84) . A point mutant of tyrosyl tRNA synthetase of 
Bacillus stearothermophilus exhibits a 100-fold 

25 increase in affinity for ATP. Substitution of one 
amino acid for another at a surface locus may 
profoundly alter binding properties of the protein 
other than substrate binding, without affecting the 
tertiary structure of the protein. For example, in 

30 sickle-cell haemoglobin the change of the surface 
residue E6 to V in the beta chains causes 

» 

deoxyhaemoglobin-S' to form fibers through self binding 
f DlCK8 3 . pl25-145); the tertiary and quaternary, 
structure of the haemoglobin are not changed (PADL85 , 
35 WISH75. WISH761 . 



WO 90/02809 PCT/US89/03731 



Changing a single amino acid in BPTI greatly 
reduces its binding to trypsin, but some of the new 
molecules retain the parental characteristics of 
5 binding to and inhibiting chymotrypsin, while others 
exhibit new binding to elastase ( TANK77; TSCH87 ) . 
Changes of single amino acids on the surface of the 
lambda Cro repressor greatly reduce its affinity for 
the natural operator Or3, but greatly increase the 
10 binding of the mutant protein to a mutant operator 
(EISE85 ) • Thus changing the surface of a binding 
protein may alter its specificity without abolishing 
binding activity • 

15 The recently developed techniques of "reverse 

genetics" have been used to produce single specific 
mutations at precise base pair loci ( OLIP86, OLIP87, 
and AUSU87 ) . Mutations are generally detected by 
sequencing and in some cases by loss of wild-type 

«20 function. These procedures allow researchers to 
analyze the function of each residue in a protein 
( MILL88 ) or of each base pair in a regulatory DNA 
sequence ( CHEN88 ) . In these analyses , the norm has 
been to strive for the classical goal of obtaining 

25 mutants carrying a single alteration (AUSU87) • 

Reverse genetics is often applied to coding 
regions to determine which residues are most important 
to protein structure and function; isolation of a 
30 single mutant at each residue of the protein gives an 
initial estimate of which residues play crucial roles. 

Prior to the method of the present invention, two 
general approaches have been developed to create novel 
35 mutant proteins through reverse genetics. In one 
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approach, dubbed "protein surgery" (DILL87) , a specific 
substitution is introduced at a single protein residue 
to determine the effects on structure and function of 
specific substitutions ( CRAI851 fRAOS87^ ( RASTTfl7) . 
5 However, many desirable protein alterations require 
multiple amino acid substitutions and thus are not 
accessible through single base changes or even through 
all possible amino acid substitutions at any one 
residue. 

10 

The other approach has been randomly to generate a 
variety of mutants at many loci within a cloned gene 
using mutagenic chemicals or radiation. The specific 
location and nature of the change are determined by DNA 
15 sequencing. (PAgU86.) This approach is limited by the 
number of colonies that can be examined. Also, it does 
not take advantage of any knowledge of the protein 
structure and its relationship to binding activity. 

20 Progress toward rules governing substitutions of 

amino acids (ULME83) has been greatly hampered by the 
extensive efforts involved in using either method and 
the practical limitations on the number of colonies 
that can be inspected (ROBE86) . 

25 

The term "saturation mutagenesis" with reference 
to synthetic DNA is generally taken to mean generation 
of a population in which: a) every possible single-base 
change within a fragment of a gene of DNA regulatory 

>0 region is represented, and b) most mutant genes contain 
only one mutation. Thus a set of all possible single 
mutations for a 6 base pair length of DNA comprises a 
population of 18 mutants. Oliphant et al^. (OLIP86) and 
Oliphant and Struhl (OLIP87) have demonstrated ligation 

5 and cloning of highly degenerate oligonucleotides and 



have applied saturation mutagenesis to the study of 
promoter sequence and function. They suggest that 
similar methods could be used to study genetic 
expression of proteins, but they do not say how to: a) 
choose protein residues to vary, or b) select or screen 
mutants with desirable properties. 

Reidhaar-Olson and Sauer (REID88) have used 
synthetic degenerate oligo-nts to vary simultaneously 
two or three residues through all twenty amino acids in 
the dimer interface of cl repressor from bacteriophage 
lambda. They give no discussion of the limits on how 
many residues could be varied at once nor do they 
mention the problem of unequal abundance of DNA 
encoding different amino acids. They looked for 
proteins that either had wild-type dimerization or that 
did not dimerize. They did not seek proteins having 
novel binding properties and did not report any. 

Several researchers have designed and synthesized 
proteins de novo . These designed proteins are small 
and most have been synthesized in vitro as polypeptides 
rather than genetically. Gutte and colleagues have 
made a polypeptide that binds DDT in 55% ethanol 
(M0SE83). Recently Moser et ai^. (MOSE87) reported 
genetic expression in JL. coli both of the designed 24 
residue DDT-binding protein and of fusions of the DDT- 
binding sequence to LacZ. They state that design of 
biologically active proteins is currently impossible. 

Erickson et al^ (ERIC86) have designed and 
synthesized a series of proteins that they have named 
betabellins, that are meant to have beta sheets. They 
suggest use of polypeptide synthesis with mixed 
reagents to produce several hundred analogous 
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betabellins, and use of a column to recover analogues 
with high affinity for a chosen target compound bound 
to the column. They envision successive rounds of 
mixed synthesis of variant proteins and purification by 
5 specific binding. They do not discuss how residues 
should be chosen for variation. Because proteins 
cannot be amplified, the researchers must sequence the 
recovered protein to learn which substitutions improve 
binding. The researchers must limit the level of 
10 diversity so that each variety of protein will be 
present in sufficient quantity for the isolated 
fraction to be sequenced. 

Methods have been developed to separate cells 
15 through their affinity to various substances. Methods 
applied to animal cells reveal common problems: a) non- 
specific interactions between cells and affinity 
supports, and b) irreversible binding of cells to 
affinity matrices (BONN85) . 

20 

Ferenci and collaborators have published a series 
of papers on the chromatographic isolation of mutants 
of the maltose-transport protein LamB of L. coli 
(WAND79, FERE80a, FERE80b, FERE80C, FERE82a, FERE82b, 

25 FERE83, CLUN84, FERE86a, FERE86b, FERE86C, FERE87a, 
FERE87b, HEIN87, and HEIN88) . The papers report that 
spontaneous and induced mutants at the lamB genetic 
locus can be isolated by chromatography over a column 
supporting immobilized maltose, maltodextrins , or 

30 starch. The reports speculate that other applications 
are possible, but specifically mention only the 
elucidation of the residues responsible for the 
selectivity of the maltodextrin pore or similar pore 
proteins. The mutant proteins were non-chimeric, and 

35 no attempt was made to obtain binding to a new target. 
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Both FERE8 6a and CLUN84 point up the 
difficulties of working with live bacteria that can 
metabolize chemicals and change their physiological 
5 behavior during the chromatographic experiment. 

A fragment of a heterologous gene can be 
introduced into bacteriophage Fl gene III (SMIT85) . If 
the inserted gene preserves the original reading frame , 

10 expression of the altered gene III causes an inserted 
domain to appear in the gene III protein. The 
resulting strain of fl virions are adsorbed by an 
antibody against the protein encoded by the 
heterologous DNA. The phage were eluted at pH 2.2 and 

15 retained some infectivity. However, the single copy of 
fl gene III was used for insertion of the heterologous 
gene so that all copies of gene III protein were 
affected; infectivity of the resultant phage was 
reduced 25- fold. 

20 

Smith presented his method as a way to isolate 
cloned genes using antibodies to the gene products. He 
made no mention of mutagenizing the inserted genetic 
material or of inducing novel binding properties in the 
25 inserted protein domain. 

A fragment of the repeat region of the 
circumsporozoite protein from Plasmodium falciparum has 
been expressed on the surface of M13 as an insert in 
30 the gene III protein (CRUZ88) . The recombinant phage 
were both antigenic and immunogenic in rabbits. The 
authors do not suggest mutagenesis of the inserted 
material . 
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Gene fragments coding for hepatitis B virus 
antigens have been fused to fragments of lamB. and if 
the fusion is in a region coding for exposed domains of 
LamB, the HBV antigens appear on the cell surface and 
5 are immunogenic (CHAR87) . Charbit et aJU (CHAR87) 
suggest use of these engineered strains for development 
of a live bacterial vaccine; they did not suggest 
mutagenesis of the fused heterologous gene fragments, 
nor development of binding capabilities. 

10 

Ladner, US Patent No* 4,704,692, "Computer Based 
System and Method for Determining and Displaying 
Possible Chemical Structures for Converting Double- or 
Multiple-Chain Polypeptides to Single-Chain 

15 Polypeptides" describes a design method for converting 
proteins composed of two or more chains into proteins 
of fewer polypeptide chains, but with essentially the 
same 3D structure. There is no mention of variegated 
DNA and no genetic selection. Ladner and Bird, 

20 WO88/01649 (Publ. March 10, 1988) disclose the specific 
application of computerized design of linker peptides 
to the preparation of single chain antibodies. 

Ladner, Glick and Bird, WO88/06630 (publ. 7 Sept. 

25 1988) (LGB) speculate that diverse single chain 
antibody domains may be screened for binding to a 
particular antigen by varying the DNA encoding the 
combining determining regions of a single chain 
antibody, subcloning the SCAD gene into the gpV gene of 

30 phage lambda so that a SCAD/gpV chimera is displayed on 
the outer surface of the phage, and selecting phage 
which bind to the antigen - through affinity 
chromatography. The only antigen mentioned is bovine 
growth hormone. No other binding molecules, targets, 

35 carrier organisms, or outer surface proteins are 



WO 90/02809 



PCT/US89/03731 



10 



10 



15 



20 



25 



30 



discussed. Nor is there any mention of the method or 
degree of mutagenesis. 

Ladner and Bird, WO88/06601 (publ. 7 September 
19 88) suggest that single chain "psuedodimeric" 
repressors (DNA-binding proteins) may be prepared by 
mutating a putative linker peptide followed by in vivo 
selection that mutation and selection may be used to 
create a dictionary of recognition elements for use in 
the design of asymmetric repressors. The repressors 
are not displayed on the outer surface of an organism. 

No admission is made that any cited reference is 
prior art or pertinent prior art, and the dates given 
are those appearing on the reference and may not be 
identical to the actual publication date. 

SUMMARY OF THE INVENTION 

This invention relates to the construction, 
expression, and selection of mutated genes that specify 
novel proteins with desirable binding properties, as 
well as these proteins themselves. The substances 
bound by these proteins, hereinafter referred to as 
"targets", may be, but need not be, proteins. Targets 
may include other biological or synthetic 
macromolecules as well as organic and inorganic 
molecules . 

The novel binding proteins may be obtained: 1) by 
mutating a gene encoding a known binding protein within 
the subsequence encoding a known binding domain, or 2) 



by taking such a subsequence of the gene for a first 
protein and combining it with all or part of a gene for 
a second protein (which may or may not be itself a 
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known binding protein) , 3 ) by mutating a gene encoding 
a protein which, while not possessing a known binding 
activity, possesses a secondary or higher structure 
that lends itself to binding activity (clefts, grooves, 
etc. ) , or 4 ) by mutating a gene encoding a known 
binding protein but not in the subsequence known to 
cause the binding. The protein from which the novel 
binding protein is derived need not have any specific 
affinity for the target material. 

In one embodiment, the invention relates to: 

a) preparing a variegated population of replicable 
genetic packages, each package including a nucleic 
acid construct coding on expression for an outer- 
surface-displayed potential binding protein 
comprising (i) a structural signal directing the 
display of the protein on the outer surface of the 
package and (ii) a potential binding domain for 
binding said target, where a plurality of 
different potential binding domains are displayed 
by the individual packages, 

b) causing the expression of said protein and the 
display of said protein on the outer surface of 
such packages, 

c) contacting the packages with target material so 
that the potential binding domains of the proteins 
and the target material may interact, and 
separating packages bearing a potential binding 
domain that succeeds in binding the target 
material from packages that do not so bind, 
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d) recovering and replicating at least one package 
bearing a successful binding domain, 

(e) determining the amino acid sequence of the 
5 successful binding domain of a genetic package 

which bound to the target material, 

(f) preparing a new variegated population of 
replicable genetic packages according to step (a) , 

10 the parental potential binding domain for the 

potential binding domains of said new packages 
being a successful binding domain whose sequence 
was determined in step (e) , and repeating steps 
(b)-(e) with said new population, and, when a 

15 package bearing a binding domain of desired 

binding characteristics is obtained, 

(g) abstracting the gene encoding the desired 
binding domain from the genetic package and 

20 placing it into a suitable expression system. 

(The binding domain may then be expressed as a 
unitary protein, or as a domain of a larger 
protein) - 

25 The invention further relates to a method of 

preparing a mixed population of replicable genetic 
packages in which each package includes a gene 
expressing a potential binding protein in such a manner 
that the protein is presented on the outer surface of 

30 the package. This method comprises: 

i) preparing a variegated population of DNA 
inserts of each of which comprises a first 
sequence which codes on expression for a potential 
35 binding domain and, a second sequence encoding 
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signal directing that the encoded protein be 
displayed on the outer surface of a chosen 
replicable genetic package, and 

ii) incorporating, the resulting population of DNA 
constructs into the chosen replicable genetic 
packages to produce a population of replicable 
genetic packages. 

In a preferred embodiment, the potential-binding- 
protein-encoding inserts are incorporated into a gene 
encoding an outer-surface protein of the replicable 
genetic package. 

The invention encompasses the design and synthesis 
of variegated DNA encoding a family of potential 
binding proteins characterized by constant and variable 
regions, said proteins being designed with a view 
toward obtaining a protein that binds a predetermined 
target. 

For the purposes of this invention, the term 
"potential binding protein" refers to a protein encoded 
by one species of DNA molecule in a population of 
variegated DNA wherein the region of variation appears 
in one or more subsequences encoding one or more 
segments of the polypeptide having the potential of 
serving as a binding domain for the target substance. 

From time to time, it may be helpful to speak of 
the "parent sequence" of the variegated DNA. When the 
novel binding domain sought is an analogue of a known 
binding domain, the parent sequence is the sequence 
that encodes the known binding domain. The variegated 
DNA will be identical with this parent sequence at most 
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loci, but will diverge from it at chosen loci. When a 
potential binding domain is designed from first 
principles, the parent sequence is a sequence which 
encodes the amino acid sequence that has been predicted 
5 to form the desired binding domain, and the variegated 
DNA is a population of "daughter DNAs" that are related 
to that parent by a high degree of sequence similarity. 

The fundamental principle of the invention is one 
10 of forced evolution The efficiency of the forced 
evolution is greatly enhanced by careful choice of 
which residues are to be varied. The 3D structure of 
the potential binding domain is a key determinant in 
this choice. First a set of residues that can 
15 simultaneously contact one molecule of the target is 
identified. Then all or some of the codons encoding 
these residues are varied simultaneously to produce a 
variegated population of DNA. The variegated 
population of DNA is used to transform cells so that a 
20 variegated population of genetic packages is produced. 

The mixed population of genetic packages 
containing genes encoding possible binding proteins is 
enriched for packages containing genes that express 

25 proteins that in fact bind to the target ("successful 
binding domains"). After one or more rounds of such 
enrichment, one or more of the chosen genes are 
examined and sequenced. If desired, new loci of 
variation are chosen. The selected daughter genes of 

30 one generation then become the parent sequences for the 
next generation of variegated DNA, beginning the next 
"variegation cycle," Such cycles are continued until a 
protein with the desired target affinity is obtained. 



WO 90/02809 



PCT/US89/03731 



15 

The appended claims are hereby incorporated by 
reference into this specification as an enumeration of 
the preferred embodiments. 

5 BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a schematic showing the relationships 
between various types of Binding Domains (BD) . 

10 Figure 2 is a flow chart showing the major steps used 
to create a novel protein with affinity for a pre- 
determined target. 

Figure 3 is a schematic of a PBD contacting a molecule 
15 of target material. 

Figure 4 is a schematic of the construction of pLG3 
from M13mpl8 and pBR322. 

20 Figure 5 is a schematic of the construction of pLG7 
from pLG3 and synthetic DNA. 

DETAILED DESCRIPTION OF THE INVENTION 

25 Sec. 0.1: Overview: 

The present invention separates mutated genes that 
specify novel proteins with desirable binding 
properties from closely related genes that specify 

30 proteins with no or undesirable binding properties, by: 
1) arranging that the product of each mutated gene be 
displayed on the outer surface of a replicable genetic 
package that contains the gene, and 2) using affinity 
separation incorporating a desirable target material to 

35 enrich the population of packages for those packages 
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containing genes specifying proteins with improved 
binding to that target material. 

Let Kq (x,y) be a dissociation constant, 

[x] [y] 

K D (x,y) = • 

[x:y] 

For the purposes of the appended claims, a protein 
P is a binding protein if 

(1) for one molecular , ionic or atomic species A, 
the dissociation constant K D (P,A) 

< 1CT 6 moles/liter, and 

(2) for a different molecular, ionic or 
atomic species B, (P f B) > 10" 1 
moles/liter. 

As a result of these two conditions, the protein P 
exhibits specificity for A over B, and a minimum degree 
of affinity (or avidity) for A, 

When a domain of a protein is primarily 
responsible for the protein's ability to specifically 
bind a chosen target, it is referred to herein as a 
"binding domain" (BD) . We engineer the appearance of a 
stable protein domain, denoted as an "initial potential 
binding domain" (IPBD) , on the surface of a genetic 
package. The present invention is concerned with the 
expression of numerous, diverse, variant "potential 
binding domains" (PBD) , all related to a "parental 
potential binding domain" (PPBD) such as the binding 
domain of a known binding protein, and with selection 
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and amplification of the genes encoding the most 
successful mutant PBDs. An IPBD is chosen as PPBD to 
the first round of variegation. Selection-through- 
binding isolates one or more "successful binding 
5 domains" (SBD) . An SBD from one round of variegation 
and selection-through-binding is chosen to be the PPBD 
for the next round. The invention is not, however, 
limited to proteins with a single BD since the method 
may be applied to any or all of the BDs of the protein, 
10 sequentially or simultaneously. The relationships of 
the various BDs are illustrated in Figure 1. 

The term "variegated DNA" refers to a population 
of molecules that have the same base sequence through 

15 most of their length, but that vary at a limited number 
of defined loci, preferably 5-10 codons. A molecule of 
variegated DNA can be introduced into a plasmid so that 
it constitutes part of a gene (CLIPS 6, 0LIP87, AUSU87, 
REID88) . When plasmids containing variegated DNA are 

20 used to transform bacteria, each cell makes a version 
of the original protein. Each colony of bacteria may 
produce a different version from any other colony. If 
the variegations of the DNA are concentrated at loci 
known to be on the surface of the protein or in a loop, 

25 a population of proteins will be generated, many 
members of which will fold into roughly the same 3D 
structure as the parent protein. The specific binding 
properties of each member, however, may be different 
from each other member. It remains to sort out the 

30 colonies containing genes for proteins with desirable 
binding properties from those that do not exhibit the 
desired affinities. 

A "single-chain antibody" is a single chain 
35 polypeptide comprising at least 200 amino acids, said 
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amino acids forming two antigen-binding regions 
connected by a peptide linker that allows the two 
regions to fold together to bind the antigen. Either 
the two antigen-binding regions must be variable 
domains of known antibodies , or they must (1) each fold 
into a beta barrel of nine strands that are spatially 
related in the same way as are the nine strands of 
known antibody variable light or heavy domains, and (2) 
fit together in the same way as do the variable domains 
of said known antibody. Generally speaking , this will 
require that, with the exception of the amino acids 
corresponding to the hypervariable region, there is at 
least 88% homology with the amino acids of the variable 
domain of a known antibody. 

The term "affinity separation means" includes, but 
is not limited to: a) affinity column chromatography, 
b) batch elution from an affinity matrix material, c) 
batch elution from an affinity material attached to a 
plate, d) fluorescence activated cell sorting, and e) 
electrophoresis in the presence of target material. 
"Affinity material" is used to mean a material with 
affinity for the material to be purified, called the 
"analyte 11 . In most cases, the association of the 
affinity material and the analyte is reversible so that 
the analyte can be freed from the affinity material 
once the impurities are washed away. 

Affinity column chromatography, batch elution from 
an affinity matrix material held in some container, and 
batch elution from a plate are very similar and 
hereinafter will be treated under "affinity 
chromatography. " 
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Fluorescent-activated cell sorting involves use of 
an affinity material that is fluorescent £er se or is 
labeled with a fluorescent molecule. Current 
commercially available cell sorters require 800 to 1000 
5 molecules of fluorescent dye, such as Texas red, bound 
to each cell. FACS can sort 10 3 cells or viruses/sec. 

Electrophoretic affinity separation involves 
electrophoresis of viruses or cells in the presence of 
10 target material, wherein the binding of said target 
material changes the net charge of the virus particles 
or cells. It has been used to separate bacteriophages 
on the basis of charge. (SERW87) . 

15 The present invention makes use of affinity 

separation of bacterial cells, or bacterial viruses (or 
other genetic packages) to enrich a population for 
those cells or viruses . carrying genes that code for 
proteins with desirable binding properties. 

20 

In the present invention, the words "select" and 
"selection" are used exclusively in the genetic sense? 
i.e. a biological process whereby a phenotypic 
characteristic is used to enrich a population for those 
25 organisms displaying the desired phenotype. 

The process of the present invention comprises 
three major parts: 

30 I. design and production of a replicable 

genetic package (GP) that displays an IPBD on 
the surface of the GP, denoted GP(IPBD), 

II. design and implementation of an affinity 
35 separation process that separates GP(IPBD)s 
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that bind to a known affinity molecule from 
wild- type GPs or GP(IPBD~)s, neither of which 
binds the known affinity molecule, and 

5 III. design and implementation of a genetic 

variegation method, denoted structure- 
directed mutagenesis, wherein a population of 
10 6 or more different GP(PBD)s, denoted 
GP(vgPBD), is produced. 

10 

One affinity separation is called a "separation cycle"; 
one pass of variegation followed by as many separation 
cycles as are needed to isolate an SBD, is called a 
"variegation cycle". The amino acid sequence of one 
15 SBD from one round becomes the PPBD to the next 
variegation cycle. We perform variegation cycles 
iteratively until the desired affinity and specificity 
of binding between an SBD and chosen target are 
achieved. 

20 

Part I is a strain construction in which we deal 
with a single IPBD sequence. Variability may be 
introduced into DNA subsequences adjacent to the ipbd 
subsequence and within the oso-ipbd gene so that the 

25 IPBD will appear on the GP surface, A molecule, such 
as an antibody, having high affinity for correctly 
folded IPBD is used to: a) detect IPBD on the GP 
surface, b) screen colonies for display of IPBD on the 
GP surface, or c) select GPs that display IPBD from a 

30 population, some members of which might display IPBD on 
the GP surface* In one preferred embodiment, Part I of 
the process involves: . 



35 



1) choosing a GP such as a bacterial cell (Sec. 
1.1.1), bacterial spore (1.2.1), or phage (1.3.1), 
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having a suitable outer surface protein (Sees* 
1.1.3, 1.2-3, and 1.3-3) , 



2) choosing a stable IPBD (Sec. 2), 

5 

3) designing an amino acid sequence that: a) 
includes the IPBD as a subsequence and b) will 
cause the IPBD to appear on the GP surface (Sees. 
1.1.2, 1-2.2, 1.3.2, and 4), 

10 

4) engineering a gene, denoted osp-ipbd. that: a) 
codes for the designed animo acid sequence, b) 
provides the necessary genetic regulation, and c) 
introduces convenient sites for genetic 

15 manipulation (Sees. 4.1, 4.2, 4.3, 5.1, and 5.2), 

5) cloning the osp-ipbd gene into the GP (Sec. 
6.1), and 

20 6) harvesting the transformed GPs (Sec. 7) and 

testing them for presence of IPBD on the GP 
surface (Sec. 8) ; this test is performed with an 
affinity molecule having high affinity for IPBD, 
denoted AfM(IPBD) - 

25 

In another preferred embodiment, Part I of the process 
involves: 

1) and 2) as above 

3) designing a DNA sequence that: a) encodes the 
IPBD as a subsequence and b) contains suitable 
restriction sites so that random DNA may be 
operably linked to the ipbd gene fragment; and c) 
provides the necessary genetic regulations; this 
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DNA sequence is called a "display probe", (Sees. 
1.1.4, 1.2.4, 1.3.4 and 4), 

4) constructing that display probe, 

5) cloning the display probe into and amplifying 
it in a suitable host into the OCV, 

6) cloning random or pseudorandom DNA into one of 
the restriction sites provided in the display 
probe, (Sec. 6.2), whereby the random or 
pseudorandom DNA functions as a potential osp , and 

7) harvesting GPs (Sec. 7) screening colonies of 
the transformed GPs for presence of IPBD on the GP 
surface; this screening is performed with an 
affinity molecule having high affinity for IPBD, 
denoted AfM ( IPBD ) , ( Sec . 8 ) ; or , alternatively ; 

8) selecting GPs that display IPBD by use of an 
affinity separation using AfM (IPBD) , (Sec. 8) . 

Once a GP(IPBD) is produced, it can be used many 
times as the starting point for developing different 
novel proteins that bind to a variety of different 
targets. The knowledge of how we engineer the 
appearance of one IPBD on the surface of a GP can be 
used to design and produce other GP(IPBD)s that display 
different IPBDs. 

Although Part I deals with only a single IPBD, 
many preparations are made for Part III where we 
introduce numerous mutations into the potential binding 
domain. References to PBD or pbd in Part I are to 
indicate a preparatory intent. 
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In Part II we optimize separation of GP(IPBD) from 
wild-type GP, denoted wtGP, based on the affinity of 
IPBD for AfM(IPBD) and establish the sensitivity of the 
affinity separation process. In a preferred 
embodiment, Part II of the process of the present 
invention involves: 

1) preparing affinity columns bearing AfM(IPBD) at 
various densities of AfM(IPBD)/ (volume of matrix), 
(Sec. 10.1), 

2) preparing GP(IPBD)s with various amounts of 
IPBD per GP, 

3) picking a gradient regime for eluting the 
columns (Sec. 10.1), 

4) determining which combination of: a) IPBD/GP, 
b) density of AfM( IPBD)/ (volume of support), c) 
initial ionic strength, d) elution rate, and e) 
(amount of GP) / (volume of support) loaded, gives 
the best separation of GP(IPBD) from wtGP (Sec. 
10.1), 

5) determining the smallest amount of GP(IPBD) 
that can be isolated from a much larger amount of 
wtGP using the optimal condition, (Sec. 10.2), and 

6) determining the efficiency of the affinity 
separation procedure (Sec. 10.3). 

Part II optimizes separation of a single type of 
GP(IPBD) from a large excess of a single different GP. 
The optimum conditions will be used in Part III to 
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separate GP(PBD)s that bind the target from GP(PBD)s 
that do not bind the target. The optimization will be 
at one or more specific temperatures and at one or more 
specific pHs. In Part III, the user must specify the 
conditions under which the selected SBD should bind the 
target. If the conditions of intended use differ 
markedly from the conditions for which affinity 
separation was optimized , the user must return to Part 
II and optimize the affinity separation for conditions 
similar to the conditions of intended use of the 
selected SBD. 

In Part III, we choose a target material and a 
GP(IPBD) that was developed by the method of Part I and 
that is suitable to the target material. Using IPBD as 
the PPBD to the first cycle of variegation, we prepare 
a wide variety of osp-obd genes that encode a wide 
variety of PBDs. We use an affinity separation, 
developed by the method of Part II, to enrich the 
population of GP(vgPBD)s for GPs that display PBDs with 
binding properties relative to the target that are 
superior to the binding properties of the PPBD. An SBD 
selected from one variegation cycle becomes the PPBD to 
the next variegation cycle. In a preferred embodiment, 
Part III of the process of the present invention 
involves: 

1) picking a target molecule (Sec. 11) , 

2) picking a GP(IPBD) (Sec. 12), 

3) picking a set of several residues in the PPBD' 
to vary based on a) the 3D structure of the IPBD, 
b) sequences of homologous proteins, and c) 
computer or theoretical modeling that indicates 
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which residues can tolerate different amino acids 
without disrupting the underlying structure (Sec. 
13.1), 

5 4) picking a subset of the residues to be varied 

simultaneously based on the number of different 
variants and which variants are within the 
detection capabilities of the affinity separation; 
(Sec. 13.2); 

10 

5) implementing the variegation by: 

a) synthesizing the part of the osp-pbd gene 
that encodes the residues to be varied using a 
specific mixture of nucleotide substrates for 
some or all of the bases encoding residues 
slated for variation, thereby creating a 
population of DNA molecules, denoted vgDNA 
(Sec. 13.3), 

b) ligating this vgDNA, by standard methods, 
into the operative cloning vector (OCV) (e.g. 
a plasmid or bacteriophage) (Sec. 14.1), 

25 c) using the ligated DNA to transform cells, 

thereby producing a population of transformed 
cells (Sec. 14.2), 

d) culturing ( i.e. increasing in number) the 
population of transformed cells and harvesting 
the population of GP(PBD)s, said population 
being denoted as GP(v'gPBD), (Sec. 14.3), 
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20 



30 
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e) enriching the population for GPs that bind 
the target by using the affinity separation 
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process developed in Peart II , with the chosen 
target molecule as affinity molecule (Sec. 15) , 

f) repeating steps III.5.d and Ill.S.e until a 
GP(SBD) having improved binding to the target 
is isolated (Sec, 15) , and 

g) testing the isolated SBD or SBDs for 
affinity and specificity for the chosen target 
(Sec. 15.8), 

6) repeating steps III. 3, III. 4, and III. 5 until 
the desired degree of binding is obtained. 

Part III is repeated for each new target material. 
Part I need be repeated only if no GP(IPBD) suitable to 
a chosen target is available. Part II need be repeated 
for each newly-developed GP(IPBD) and for previously- 
developed GP(IPBD)s if the intended conditions of use 
of a. novel binding protein differ significantly from 
the conditions of previous optimizations. 

Sec. 0.2: Abbreviations: 

The following abbreviations will be used 
throughout the present invention: 

Abbreviation Meaning 

GP Genetic Package , e.g. a 





bacteriophage 



- X 



Any protein 



The gene for protein X 
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Initial Potential Binding 
Domain, e.g. BPTI 

Potential Binding Domain, e.g. 
a derivative of BPTI 

Successful Binding Domain, 
e.g. a derivative of BPTI 
selected for binding to a 
target 

Parental Potential Binding 
Domain, i.e. an IPBD or an SBD 
from a previous selection 

Outer Surface Protein, e.g. 
coat protein of a phage or 
LamB from coli 

Fusion of an OSP and a PBD, 
order of fusion not specified 

Outer Surface Transport Signal 

A genetic package containing 
the x gene 

A genetic package that 
displays X on its outer 
surface 

An affinity matrix supporting 
"Q" , e.g. {T4 lysozyme} is T4 
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lysozyme attached to an 
affinity matrix 



AfM(W) 



A molecule having affinity for 
"W", e.g, trypsin is an 
AfM(BPTI) 



10 



XINDUCE 



A chemical that can induce 
expression of a gene, e.g. 
IPTG for the lacUVS promoter 



OCV 



Operative Cloning Vector 



15 



Rp - [T][SBD]/[T:SBD] (T is a 
target) 



K N 



% » [N] [SBD]/[N:SBD] (N is a 
non-target) 



20 



DoAMoM 



Density of AfM(W) on affinity 
matrix 



25 



Abun (x) 

OMP 
nt 



Abundance of DNA molecules 
encoding amino acid x 

Outer membrane protein 

nucleotide 
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A bimolecular dissociation 
constant, 1^= [A][B]/[A:B] 



35 



'err 



Error level in synthesizing 
vgDNA 
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Sac. 0.3: Standard sequencing method; 

The present invention is not limited to a single 
5 method of determining the sequence of nucleotides (nts) 
in DNA subsequences. Sequencing reactions, agarose gel 
electrophoresis, and polyacrylamide gel electrophoresis 
(PAGE) are performed by standard procedures (AUSU87) . 

10 The present invention is not limited to a single 

method of determining protein sequences, and reference 
in the appended claims to determining the amino acid 
sequence of a domain is intended to include any 
practical method or combination of methods, whether 
15 direct or indirect. The preferred method, in most 
cases, is to determine the sequence of the DNA that 
encodes the protein and then to infer the amino acid 
sequence. In some cases, standard methods of protein- 
sequence determination may be needed to detect post- 
20 translational processing. 

*** — — 

The major steps in the process of making and 
25 isolating a novel binding protein with affinity for a 
chosen target material are illustrated in Figure 2. 

Sec, it Specification of Genetic Package a nd Means for 
Displaying a Heterologous Binding Doma in On Its Outer 
30 Surface ; 

Sec. 1.0: General Requirements for Genetic Packages 

It is emphasized that the GP on which select ion- 
35 through-binding will be practiced must be capable, 
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after the selection, either of growth in some suitable 
environment or of in vitro amplification and recovery 
of the encapsulated genetic message. During at least 
part of the growth, the increase in number must be 
5 approximately exponential with respect to time- The 
component of a population that exhibits the desired 
binding properties may be quite small, for example, one 
in 10 6 or less. Once this component of the population 
is separated from the non-binding components, it must 
10 be possible to amplify it. Culturing viable cells is 
the most powerful amplification of genetic material 
known and is preferred. Genetic messages can also be 
amplified in vitro , but this is not preferred. 

15 A GP may typically be a vegetative bacterial cell, 

a bacterial spore or a bacterial DNA virus. A strain 
of any living cell or virus is potentially useful if 
the strain can be: 

20 1) maintained in culture, 

2) affinity separated and retain its viability, 

3) genetically altered with reasonable facility, 
25 and 

4) manipulated to display the potential binding 
protein domain where it can interact with the 
target material during affinity separation. 

30 

DNA encoding the IPBD sequence may be operably 
linked to DNA encoding at least the outer surface 
transport signal of an outer surface protein (OSP) 
native to the GP so that the IPBD is displayed on the 
35 outer surface of the GP. It should be possible to 
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cause a genetic package to display the IPBD or PBD on 
its outer surface without adversely affecting the 
viability of the GP or the binding characteristics of 
the IPBD or PBD, if the fusion is near domain 
5 boundaries ( BECK83 . CRAW87, TOTH86, SMIT85, MAN086; and 
cf . ROSS81, HOLL83) . 



Those characteristics of a protein that are 
10 recognized by a cell and that cause it to be 
transported out of the cytoplasm and displayed on the 
cell surface will be termed "outer-surface transport 
signals", 

15 The replicable genetic entity (phage or plasmid) 

that carries the osp-obd genes (derived from the ogB^ 
jpbd gene) through the selection-through-binding 
process, see Sec. 14, is referred to hereinafter as the 
operative cloning vector (OCV) . When the OCV is a 

20 phage, it may also serve as the genetic package. The 
choice of a GP is dependent in part on the availability 
of a suitable OCV and suitable OSP. 

Preferably, the GP is readily stored, for example, 
25 by freezing. If the GP is a cell, it should have a 
short doubling time, such as 20-40 minutes. If the GP 
is a virus, it should be prolific, e.g., a burst size 
of at least 100/ infected cell. GPs which are finicky 
or expensive to culture are disfavored. The GP should 
30 be easy to harvest, preferably by centrifugation. The 
GP is preferably stable for a temperature range of -70 
to 42°C (stable at 4°C for several days or weeks) ; 
resistant to shear forces found in HPLC; insensitive to 
UV; tolerant of desiccation; and resistant to a pH of 
35 2.0 to 10.0, surface active agents such as SDS or 
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Triton, chaotropes such as 4M urea or 2M guanidinium 
HC1, common ions such as K + / Na + , and S0 4 , common 
organic solvents such as ether and acetone, and 
degradative enzymes. Finally, there must be a suitable 
5 OCV (see Sec. 3) . 

Preferably, the 3 D structure of the OSP, and the 
sequence of the OSP gene p. 47 are known. If the 3D 
structure is not known, there is preferably knowledge 

10 of which residues are exposed on the cell surface, the 
location of the domain boundaries within the OSP, 
and/or of successful fusions of the OSP and a foreign 
insert. The OSP preferably appears in numerous copies 
on the outer surface of the GP, and preferably serves a 

15 non-essential function. It is desirable that the OSP 
not be post translationally processed, or at least that 
this processing be understood. 

The preferred GP, OCV and OSP are those for which 
20 the fewest serious obstacles can be seen, rather than 
the one that scores highest on any one criterion. 

Next, we consider general answers to the questions 
posed in this step for the cases of: a) vegetatively 
25 growing bacterial cells (Sec. 1.1), b) bacterial spores 
(Sec. 1.2), and c) (Sec. 1.3). Preferred OSPs for 
several GPs are given in Table 2. 

Sec, 1.1: Bacterial Cells as Genetic Packages: 

30 

One may choose any well-characterized bacterial 
strain which may be grown in culture. The important 
questions in this case are: a) do we know enough about 
mechanisms that localize proteins on the outside of the 
35 cell, b) will the IPBD fold in the environment of the 
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outer membrane, and c) will cells change expression of 
osp-pbd , derived from osp-ipbd , during affinity 
separation? Some IPBDs may need large or insoluble 
prosthetic groups , such as an Fe 4 S 4 cluster; that are 
available within the cell, but not in the medium- The 
formation of Fe 4 S 4 clusters found in some ferrodoxins 
is catalyzed by enzymes found in the cell (BON085) • 
IPBDs that require such prosthetic groups may fail to 
fold or function if displayed on bacterial cells. 

Sec. 1,1.1: Preferred Bacterial Cells as GP : 

In view of the extensive knowledge of E*. coli, a 
strain of iU coli , defective in recombination, is the 
15 strongest candidate as a bacterial GP- Other preferred 
candidates are Salmonella typhimurium . Bacillus 
subtilis, and Pseudomonas aeruginosa - 

sec. 1,1.2: Prefer red Outer Surface — Proteins — for 
20 Displaying IPBDs on Bacteri al Cells: 

Gram-negative bacteria have outer-membrane 
proteins (OMP) , that form a subset of OSPs. Many OMPs 
span the membrane one or more times. The signals that 

25 cause OMPs to localize in the outer membrane are 
encoded in the amino acid sequence of the mature 
protein. Fusions of fragments of omp genes with 
fragments of an x gene have led to X appearing on the 
outer membrane (BENS84, CLEM81) . If no fusion data are 

30 available, then we fuse an ipbd fragment to various 
fragments of the osp gene and obtain GPs that display 
the osp-ipbd fusion on the cell outer surface by 
screening or selection for the display-of-IPBD 
phenotype. 



5 



10 
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Oliver has reviewed mechanisms of protein 
secretion in bacteria (OLIV85 and 0LT787). Nikaido and 
Vaara (NIKA87) have reviewed mechanisms by which 
proteins become localized to the outer membrane of 
Gram-negative bacteria. For example, the LamB protein 
of EL. coli is synthesized with a typical signal- 
sequence which is subsequently removed. Benson et al. 
(BENS84) showed that LamB-LacZ fusion proteins would be 
deposited in the outer membrane of Ej, coli when 
residues 1-49 of the mature LamB protein are included 
in the fusion, but that residues 1-43 are insufficient. 

LamB of EL. coli is a porin for maltose and 
maltodextrin transport, and serves as the receptor for 
adsorption of bacteriophages lambda and K10. This 
protein has been purified to homogeneity (ENDE78) and 
shown to function as a trimer (PALV79) . Mutations to 
phage resistance have been used to define the parts of 
the LamB protein that adsorb each phage (ROAM80, 
CLEM81, CLEM83, GEHR87) . 

Topological models have been developed that 
describe the function of phage receptor and 
maltodextrin transport. The models describe these 
domains and their locations with respect to the 
surfaces of the outer membrane (CLEM81, CLEM83 , CHAR84, 

HEIN88) . 

LamB is transported to the outer membrane if a 
functional N-terminal sequence is present; further, the 
first 49 amino acids of the mature sequence are 
required for successful transport (BENS84) . Homology 
between parts of LamB protein and other outer membrane 
proteins OmpC, OmpF and PhoE has been detected 
(NIKA84) , including homology between LamB amino acids 
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39-49 and sequences of the other proteins. These 
subsequences may label the proteins for transport to 
the outer membrane. Further, monoclonal antibodies 
derived from mice immunized with purified LamB, have 
5 been used to characterize four distinct topological and 
functional regions , two of which are concerned with 
maltose transport (GABA82) . 

Sec. 1,1-3 Choice of Insertion site for IPBD in 

10 Bacterial Cell OSPt 

For fusions of the phoA into the coding sequence 
for an integral membrane protein, the PhoA domain is 
localized according to where in the integral membrane 

15 protein the ohoA gene was inserted (BECK83 and MAN086) 
That is, if phoA is inserted after an amino acid which 
normally is found in the cytoplasm, then PhoA appears 
in the cytoplasm. If ohoA is inserted after an amino 
acid normally found in the periplasm, however, then the 

20 PhoA domain is localized on the periplasmic side of the 
membrane, and anchored in it. Beckwith and colleagues 
(BECK88) have extended these observations to the lacZ 
gene that can be inserted into genes for integral 
membrane proteins such that the LacZ domain appears in 

25 either the cytoplasm or the periplasm according to 
where the lacZ gene was inserted. 

OSP-IPBD fusion proteins need not fill a 
structural role in the outer membranes of Gram-negative 

30 bacteria because parts of the outer membranes are not 
highly ordered. For large OSPs there is likely to be 
one or more sites at which osp can be truncated and 
fused to ipbd such that cells expressing the fusion 
will display IPBDs on the cell surface. If fusions 

35 between fragments of osp and x have been shown to 
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display X on the cell surface , we can design an osp- 
ipbd gene by substituting ipbd for x in the DNA 
sequence. Otherwise, successful OMP-IPBD fusion is 
preferably sought by fusing fragments of the best omp 
to an ipbd , expressing the fused gene, and testing the 
resultant GPs for display-of-IPBD phenotype. We use 
the available data about OMP to pick the point or 
points of fusion between omp and ipbd to maximize the 
likelihood that IPBD will be displayed- Alternatively, 
we truncate osp at several sites or in a manner that 
produces osp fragments of variable length and fuse the 
osp fragments to ipbd ; cells expressing the fusion are 
screened or selected which display IPBDs on the cell 
surface. An additional alternative is to include short 
segments of random DNA in the fusion of omp fragments 
to ipbd and then screen or select the resulting 
variegated population for members exhibiting the 
display-of-IPBD phenotype. 

The promoter for the osp-ipbd gene, preferably, is 
subject to regulation by a small chemical inducer, such 
as isopropyl thiogalactoside (IPTG) f lac UV5 promoter) . 
It need not come from a natural osp gene; any 
regulatable bacterial promoter can be used (MANI82) . 

Once a genetic packaging system employing 
vegetative bacterial cells has been designed , it is 
time to choose an IPBD (Sec. 2) . 

Sec. 1.1,4: In Vivo Selection for Pseudo-osp Gene From 
Randoiq PffA Iftpeytg 3J3 Bacteyial Cells: 

As an alternative to choosing a natural OSP and an 
insertion site in the OSP, we can construct a gene 
comprising: a) a regulatable promoter f e.g. lacUVS) , b) 
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a Shine-Dalgamo sequence, c) a periplasmic transport 
signal sequence, d) a fusion of the iobd gene with a 
segment of random DNA (as in Kaiser et al^. (KAIS87)), 
e) a stop codon, and f) a transcriptional terminator. 
5 The random DNA, which preferably comprises 90-300 
bases, encode numerous potential OSTS. (EF. KAIS87) 
The fusion of ipbd and the random DNA could be in 
either order, but ipbd upstream is slightly preferred. 
Isolates from the population generated in this way can 

10 be screened for display of the IPBD. Preferably, a 
version of selection-through-binding is used to select 
GPs that display IPBD on the GP surface, and thus 
contain a DNA insert encoding a functional OSTS. 
Alternatively, clonal isolates of GPs may be screened 

15 for the display-of-IPBD phenotype. 

The preference for ipbd upstream of the random DNA 
arises from consideration of the manner in which the 
successful GP(IPBD) will be used. In Part ill, we will 

20 introduce numerous mutations into the pbd region of the 
osp-pbd gene, some of which might include gratuitous 
stop codons. If pbd precedes the random DNA, then 
gratuitous stop codons in pb£ lead to no OSP-PBD 
protein appearing on the cell surface. If pbd follows 

25 the random DNA, then gratuitous stop codons in pbd 
might lead to incomplete OSP-PBD proteins appearing on 
the cell surface. Incomplete proteins often are non- 
specifically sticky so that GPs displaying incomplete 
PBDs are easily removed from the population. 

30 

Sec. 1.2; Displaying IPBD o n bacterial snores: 

Bacterial spores have desirable properties as GP 
candidates. Bacillus spores neither actively 
35 metabolize nor alter the proteins on their surface. 
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However, spores are much more resistant than vegetative 
bacterial cells or phage to chemical and physical 
agents • Spores have the disadvantage that the 
molecular mechanisms that trigger sporulation are less 
5 well worked out than is the formation of M13 or the 
export of protein to the outer membrane of L coli- 

Sec. 1.2,1,: Preferred Bacterial Snores for Use as GPs; 

10 Bacteria of the genus Bacillus form endospores 

that are extremely resistant to damage by heat f 
radiation, desiccation, and toxic chemicals (reviewed 
by Losick et al. (L0SI86) ) . These spores have complex 
structure and morphogenesis that is species-specific 

15 and only partially elucidated- The following 
observations are relevant to the use of Bacillus spores 
as genetic packages* 

Plasmid DNA is commonly included in spores. 

20 Plasmid encoded proteins have been observed on the 
surface of Bacillus spores (DEBR86) . Sporulation 
involves complex temporal regulation that is moderately 
well understood (LOSI86) . The sequences of several 
sporulation promoters are known; coding sequences 

25 operatively linked to such promoters are expressed only 
during sporulation (RAYC87) . 

Donovan et al. have identified several polypeptide 
components of B^. subtil is spore coat (DON087) ; the 

30 sequences of two complete coat proteins and amino- 
terminal fragments of two others have been determined. 
Some components of the spore are synthesized in the 
forespore, e.g. small acid-soluble spore proteins 
(EKRI88), while other components are synthesized in the 

35 mother cell and appear in the spore f e.g* the coat 
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proteins) . This spatial organization of synthesis is 
controlled at the transcriptional level. 



Spores self -assemble, but the signals that cause 
5 various proteins to localize in different parts of the 
spore are not well understood; presumably, the signals 
controlling deposition of the coat proteins from the 
cytoplasm of the mother cell onto the spore coat are 
embedded in the polypeptide sequence . Some, but not 

10 all, of the coat proteins are synthesized as precursors 
and are then processed by specific proteases before 
deposition in the spore coat (D0NO87) . Viable spores 
that differ only slightly from wild- type are produced 
in JL. subtilis even if any one of four coat proteins is 

15 missing (DON087) . Disulfide bonds form within the 
spore (thiol reducing agents are needed to solubilize 
several of the proteins of the coat). The 12kd coat 
protein, CotD, contains 5 cysteines. CotD also 
contains an unusually high number of histidines (16) 

20 and prolines (7). The 113cd coat protein, CotC, 
contains only one cysteine and one methionine. CotC 
has a very unusual amino-acid sequence with 19 lysines 
(K) appearing as 9 K-K dipeptides and one isolated K. 
There are also 20 tyrosines (Y) of which 10 appear as 5 

25 Y-Y dipeptides. Peptides rich in Y and K are known to 
become crosslinked in oxidizing environments (DEV078, 
WAIT83, WAIT86) . CotC contains 16 D and E amino acids 
that nearly equals the 19 Ks. There are no A, F, R, I, 
L, N, P, Q, S, or W amino acids in CotC. Neither CotC 

30 nor CotD is post-translationally cleaved. The proteins 
CotA and CotB are post-translationally cleaved. 
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Enddspores from the genus Bacillus are more stable 
than are exospores from Streotomvces . Bacillus 
subtilis forms spores in 4 to 6 hours, but Streotomvces 
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species may require days or weeks to sporulate. In 
addition, genetic knowledge and manipulation is much 
more developed for subtil is than for other spore- 
forming bacteria. Thus Bacillus spores are preferred 
5 over streotomvces spores. Bacteria of the genus 
Clostridium also form very durable endospores, but 
Clostridia, being strict anaerobes, are not convenient 
to culture. The choice of a species of Bacillus is 
governed by knowledge and availability of cloning 

10 systems and by how easily sporulation can be 
controlled. A particular strain is chosen by the 
criteria listed in Sec. 1.0. Many vegetative 
biochemical pathways are shut down when sporulation 
begins so that prosthetic groups might not be 

15 available. 

Sec. 1.2.2 Preferred outer-surface proteins — for 

Displaying IPBD on Bacterial Snores: 

20 If a spore is chosen as GP, the promoter is the 

most important part of the osp gene, because the 
promoter of a spore coat protein is most active: a) 
when spore coat protein is being synthesized and 
deposited onto the spore and b) in the specific place 

25 that spore coat proteins are being made. In B^ 
subtilis , some of the spore coat proteins are post- 
translationally processed by specific proteases. It is 
valuable to know the sequences of precursors and mature 
coat proteins so that we can avoid incorporating the 

30 recognition sequence of the specific protease into our 
construction of an OSP-IPBD fusion. The sequence of a 
mature spore coat protein contains information that 
causes the protein to be deposited in the spore coat; 
thus gene fusions that include some or all of a mature 
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coat protein sequence are preferred for screening or 
selection for the display-of-IPBD phenotype. 

Fusions of ipbd fragments to cote or cotD 
5 fragments are likely to cause IPBD to appear on the 
spore surface. The genes cotC and cotD are preferred 
osp genes because CotC and CotD are not post- 
translationally cleaved. Subsequences from cotA or 
cotB could also be used to cause an IPBD to appear on 
10 the surface of B*. subtil is spores, but we must take the 
post-translational cleavage of these proteins into 
account. DNA encoding IPBD could be fused to a 
fragment of cotA or cotB at either end of the coding 
region or at sites interior to the coding region. 
15 Spores could then be screened or selected for the 
display-of-IPBD phenotype. 

To date, no Bacillus sporulation promoter has been 
shown to be inducible by an exogenous chemical inducer 
as the lac promoter of JL_ coli . Nevertheless, the 
quantity of protein produced from a sporulation 
promoter can be controlled by other factors, such as 
the DNA sequence around the Shine-Dalgarno sequence or 
codon usage. 

figc. 1.2.3 5 Choice of Insertion site for IPBD in OSP 
of Bacterial Spore: 

The considerations governing insertion site in the 
30 spore OSP are the same as those given in Section 1.1.3. 

Sec, 1.2.4: In Vivo Selection for Pseu do-osp Genes 
From Random DNA Inserts in Bacterial Spores: 
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Although the considerations for spores are nearly 
identical to the considerations for vegetative 
bacterial cells (Sec, 1.1), the available information 
on the mechanisms that cause proteins to appear on 
spores is meager so that use of the random-DMA approach 
becomes a more attractive option. 

We can use the approach described above at 1.1.4 
for attaching an IPBD to an 3L_ coli cell, except that: 
a) a sporulation promoter is used, and b) no 
periplasmic signal sequence should be present. 

Sec. 1.3: Displaying IPBD on Outer Surface of Phages: 

Sec. 1.3.1: Preferred Phages for Use as GPs: 

Unlike bacterial cells and spores, choice of a 
phage depends strongly on knowledge of the 3D structure 
of an 0SP and how .it interacts with other proteins in 
the capsid. The size of the phage genome and the 
packaging mechanism are also important because the 
phage genome itself is the cloning vector. The osp- 
iobd gene must be inserted into the phage genome; 
therefore: 

1) the virion must be capable of accepting the 
insertion or substitution of genetic material, and 

2) the genome of the phage must be small enough to 
allow convenient manipulation. 

Additional considerations in "choosing phage are: 1) 
the morphogenetic pathway of the phage determines the 
environment in which the IPBD will have opportunity to 
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fold, 2) IPBDs containing essential disulfides may not 
fold within a cell, 3) IPBDs needing large or insoluble 
prosthetic groups may not fold if secreted because the 
prosthetic group is lacking, and 4) when variegation is 
5 introduced in Part III, multiple infections could 

generate hybrid GPs that carry the gene for one PBD but 
have at least some copies of a different PBD on their 
surfaces; it is preferable to minimize this 
possibility. 

10 

Bacteriophages are excellent candidates for GPs 
because there is little or no enzymatic activity 
associated with intact mature phage, and because the 
genes are inactive outside a bacterial host, rendering 
15 the mature phage particles metabolically inert. The 
filamentous phage M13 and bacteriophage PhiX174 are of 
particular interest. 



Filamentous phage : 

20 

The entire life cycle of the filamentous phage 
M13, a common cloning and sequencing vector, is well 
understood. M13 and fl are so closely related that we 
consider the properties of each relevant to both 

25 (KASC86) ; any differentiation is for historical 
accuracy. The genetic structure (the complete sequence 
(SCHA78) , the identity and function of the ten genes, 
and the order of transcription and location of the 
promoters) of M13 is well known as is the physical 

30 structure of the virion (BANN81, BOEK80, CHAN79, 
ITOK79, KAPL78, KUHN85b, KUHN87, MAKO80, MARV78, 
MESS78, 0HKA81, RASC86, RDSS81, SCHA78, SMIT85, WEBS78, 
and ZIMM82) ; see RASC86 for a recent review of the 
structure and function of the coat proteins. 
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Relevant facts about M13 are disclosed in Example 

I. 

Bacteriophage PhiX174 : 

The bacteriophage PhiX174 is a very small 
icosahedral virus which has been thoroughly studied by 
genetics, biochemistry , and electron microscopy (See 
The Single-Stranded DNA Phages (DENH78) ) . To date, no 
proteins from PhiX174 have been studied by X-ray 
diffraction. PhiX174 is not used as a cloning vector 
because PhiX174 can accept almost no additional DNA; 
the virus is so tightly constrained that several of its 
genes overlap. Chambers et al. (CHAM82) showed that 
mutants in gene G are rescued by the wild-type G gene 
carried on a plasmid so that the host supplies this 
protein. 

Three gene products of PhiX174 are present on the 
outside of the mature virion: F (capsid) , G (major 
spike protein, 60 copies per virion) , and H (minor 
spike protein, 12 copies per virion). The G protein 
comprises 175 amino acids, while H comprises 328 amino 
acids. The F protein interacts with the single- 
stranded DNA of the virus. The proteins F, G, and H 
are translated from a single mRNA in the viral infected 
cells . 

Large DNA Phages 

Phage such as lambda or T4 have much larger 
genomes than do M13 or PhiX174. Large genomes are less 
conveniently manipulated than small genomes. A phage 
with a large genome, however, could be used if genetic 
manipulation is sufficiently convenient. Phage such as 
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lambda and T4 have more complicated 3D capsid 
structures than M13 or PhiX174 , with more OSPs to 
choose from. Phage lambda virions and phage T4 virions 
form intracellular^, so that IPBDs requiring large or 
5 insoluble prosthetic groups might fold on the surfaces 
of these phage. Phage lambda and phage T4 are not 
preferred, however, derivatives of these phages could 
be constructed to overcome these disadvantages, 

10 RNA Phages 

RNA phage, such as Qbeta, are not preferred 
because manipulation of RNA is much less convenient 
than is the manipulation of DNA. Although competent 
15 RNA bacteriophage are not preferred, useful genetically 
altered RNA-containing particles could be derived from 
RNA phage, such as MS 2. 

To use MS2 as a GP, we would need to eliminate 
20 most of the natural viral genome so that an osp-ipbd 
gene could fit into the protein capsid. It is known 
that the A protein binds sequence-specif ically to a 
site at the 5' end of the + RNA strand triggering 
formation of RNA-containing particles if coat protein 
25 is present. If a message containing the A protein 
binding site and the gene for a chimera of coat protein 
and a PBD were produced in a cell that also contained A 
protein and wild-type coat protein (both produced from 
regulated genes on a plasmid) , then the RNA coding for 
30 the chimeric protein would get packaged. A package 
comprising RNA encapsulated by proteins encoded by that 
RNA satisfies the major criterion that the genetic 
message inside the package specifies something on the 
outside. The particles by themselves are not viable. 
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After isolating the packages that carry an SBD, we 
would need to: 

1) separate the RNA from the protein capsid, 

5 

2) reverse transcribe the RNA into DNA, using AMV 
or MMTV reverse transcriptase , and 

3) amplify the DNA by several cycles of polymerase 
10 chain reaction (PCR) until there is enough to 

subclone the recovered genetic message into a 
plasmid for sequencing and further work. 

Alternatively, helper phage could be used to rescue the 
15 isolated phage. 

Sec, 1.3.2: Preferred Outer-Surface Proteins for 
Displaying IPBDs on Phages; 

20 For a given bacteriophage, the preferred OSP is 

usually one that is present on the phage surface in the 
largest number of copies, as this allows the greatest 
flexibility in varying the ratio of OSP-IPBD to wild 
type OSP and also gives the highest likelihood of 

25 obtaining satisfactory affinity separation. Moreover, 
a protein present in only one or a few copies usually 
performs an essential function in morphogenesis or 
infection; mutating such a protein by addition or 
insertion is likely to result in reduction in viability 

30 of the GP. 

It is preferred that the wild- type oso gene be 
preserved. The ipbd gene fragment may be inserted 
either into a second copy of the recipient osp gene or 
35 into a novel engineered osp gene. The preferred OSP 
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for use when the GP is M13 is the gene III protein (see 
Example 1) . 

Sec. 1.3.3: Choice of Insertion site for IP BD in OSP: 

The user must choose a site in the candidate OSP 
gene for inserting a ipbd gene fragment. The coats of 
most bacteriophage are highly ordered. Thus in 
bacteriophage, unlike the cases of bacteria and spores, 
it is important to retain most or all of the residues 
of the parental OSP in engineered OSP-IPBD fusion 
proteins. A preferred site for insertion of the ipbd 
gene into the phage osp gene is one in which: a) the 
IPBD folds into its original shape, b) the OSP domains 
fold into their original shapes, and c) there is no 
interference between the two domains. 

If there is a 3D model of the phage that indicates 
that either the amino or carboxy terminus of an OSP is 
exposed to solvent, then the exposed terminus of that 
mature OSP becomes the prime candidate for insertion of 
the ipbd gene. A low resolution 3D model suffices. 

In the absence of a 3D structure, the amino and 
carboxy termini of the mature OSP are the best 
candidates for insertion of the ipbd gene. A 
functional fusion may require additional residues 
between the IPBD and OSP domains to avoid unwanted 
interactions between the domains. Random-sequence DNA 
or DNA coding for a specific sequence of a protein 
homologous to the IPBD or OSP, can be inserted between 
the ose fragment and the ipbd fragment if needed. 

Fusion at a domain boundary within the OSP is also 
a good approach for obtaining a functional fusion. 
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Smith exploited such a boundary when subcloning 
heterologous DNA into gene III of fl (SMIT85) . 



There are several methods of identifying domains. 
5 Methods that rely on atomic coordinates have been 
reviewed by Janin and Chothia (JANI85) see also ROSE85, 
RASH84, VXTA84 f PAB079 f POTE83, and SCOT87. 

If the only structural information available is 
10 the amino acid sequence of the candidate OSP, we use 
the sequence to predict turns and loops. There is a 
high probability that some of the loops and turns will 
be correctly predicted ( cf . Chou and Fasman, (CHOU72)) ; 
these locations are also candidates for insertion of 
15 the ipbd gene fragment. 



Sec, 1.3.4: In Vivo Selection for Pseudo-OSP Gene ftrom 

Random DNA Inserts in Bacterial Spores: 

20 Alternatively , a functional insertion site may be 

determined by generating a number of recombinant 
constructions and selecting the functional strain by 
phenotypic characteristics. Because the OSP-IPBD must 
fulfill a structural role in the phage coat, it is 

25 unlikely that any particular random DNA sequence 
coupled to the ipbd gene will produce a fusion protein 
that fits into the coat in a functional way. 
Nevertheless, random DNA inserted between large 
fragments of a coat protein gene and the ipbd gene will 

30 produce a population that is likely to contain one or 
more members that display the IPBD on the outside of a 
viable phage. A display probe, similar to that defined 
in 1.1.4, is constructed and random DNA sequences 
cloned into appropriate sites. 
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sac- 2t Choic e of IPBD : 

An IPBD may be chosen from naturally occurring 
proteins or domains of naturally occurring proteins, or 
5 may be designed from first principles, A designed 
protein may have advantages over natural proteins if: 
a) the designed protein is more stable, b) the designed 
protein is smaller, and c) the charge distribution of 
the designed protein can be specified more freely. 

10 

A candidate IPBD must meet the following criteria: 
1) stablility under the conditions of its intended use 
(the domain may comprise the entire protein that will 
be inserted, e.g. BPTI) , 2) knowledge of the amino acid 
15 sequence is obtainable, 3) identification of the 
residues on the outer surface, and their spatial 
relationships, and 4) availability of a molecule, 
AfM(IPBD) having high specific affinity for the IPBD. 

.20 Preferably, the IPBD is no larger than necessary 

because it is easier to arrange restriction sites in 
smaller amino-acid sequences. The usefulness of 
candidate IPBDs that meet all of these requirements 
depends on the availability of the information 

25 discussed below. 

Information used to judge IPBD suitability 
includes: 1) a 3D structure (knowledge strongly 
preferred) , 2) one or more sequences homologous to the 

30 IPBD (the more homologous sequences known, the better) , 
3) the pi of the IPBD (knowledge necessary in some 
cases) , 4) the stability and solubility as a function 
of temperature, pH and ionic strength (preferably known 
to be stable over a wide range and soluble in 

35 conditions of intended use) , 5) ability to bind metal 
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ions such as Ca** or Mg ++ (knowledge preferred; binding 
per se, no preference) , 6) enzymatic activities f if any 
(knowledge preferred , activity per se has uses but may 
cause problems) , 7) binding properties, if any 
(knowledge preferred , specific binding also preferred) , 
8) availability of a molecule having specific and 
strong affinity ( Ka < 10 -11 M) for the IPBD 
(preferred) , 9) availability of a molecule having 
specific and medium affinity ( 10~ 8 M < Kfl < 10" 6 M) 
for the IPBD (preferred) , 10) the sequence of a mutant 
of IPBD that does not bind to the affinity molecule (s) 
(preferred) , and 11) absorption spectrum in visible, 
UV, NMR, etc, (characteristic absorption preferred) • 

If only one species of molecule having affinity 
for IPBD (AfM(IPBD) ) is available, it will be used to: 
a) detect the IPBD on the GP surface, b) optimize 
expression level and density of the affinity molecule 
on the matrix (Sec. 10.1), and c) determine the 
efficiency and sensitivity of the affinity separation 
(Sees . 10 • 2 and 10.3). As noted above , however, one 
would prefer to have available two species of 
AfM(IPBD) , one with high and one with moderate affinity 
for the IPBD. The species with high affinity would be 
used in initial detection and in determining efficiency 
and sensitivity (10.2 and 10.3), and the species with 
moderate affinity would be used in optimization (10.1). 

For at least 20 candidate IPBDs the above 
information is available or is practical to obtain, for 
example , bovine pancreatic trypsin inhibitor (BPTI, 58 
residues) , crambin (46 residues) , third domain of 
ovomucoid (56 residues) , T4 lysozyme (164 residues) , 
and azurin (128 residues) . 
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Most of the PBDs derived from a PPBD according to 
the process of the present invention affect residues 
having side groups directed toward the solvent. 
Exposed residues can accept a wide range of amino 
5 acids, while buried residues are more limited in this 
regard (REID88) . Surface mutations typically have only 
small effects on melting temperature of the PBD, but 
may reduce the stability of the PBD. Hence the chosen 
IPBD should have a high melting temperature (60°C 

10 acceptable, the higher the better) and be stable over a 
wide pH range (8.0 to 3.0 acceptable; 11.0 to 2.0 
preferred) , so that the SBDs derived from the chosen 
IPBD by mutation and selection-through-binding will 
retain sufficient stability. Preferably, the 

15 substitutions in the IPBD yielding the various PBDs do 
not reduce the melting point of the domain below 50°C. 

Two general characteristics of the target 
molecule, size and charge, make certain classes of 

20 IPBDs more likely than other classes to yield 
derivatives that will bind specifically to the target. 
Because these are very general characteristics, one can 
divide all targets into six classes: a) large positive, 
b) large neutral, c) large negative, d) small positiye, 

25 e) small neutral, and f) small negative. A small 
collection of IPBDs, one or a few corresponding to each 
class of target, will contain a preferred candidate 
IPBD for any chosen target. 

30 Alternatively, the user may elect to engineer a 

GP(IPBD) for a particular target? Sec 2.1 gives 
criteria that relate target size and charge to the 
choice of IPBD. 

35 Sec. 2.1: Influence of target size on c hoice of IPBD: 
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If the target is a protein or other macromolecule 
a preferred embodiment of the IPBD is a small protein 
such as BPTI from Bos taurus (58 residues) , crambin 
5 from rape seed (46 residues) , or the third domain of 
ovomucoid from Cotumix coturnix Japonica (Japanese 
quail) (56 residues) (PAPA82) , because targets from 
this class have clefts and grooves that can accommodate 
small proteins in highly specific ways. If the target 
10 is a macromolecule lacking a compact structure, such as 
starch, it should be treated as if it were a small 
molecule. Extended macromolecules with defined 3D 
structure, such as collagen, should be treated as large 
molecules. 

15 

If the target is a small molecule, such as a 
steroid , a preferred embodiment of the IPBD is a 
protein the size of ribonuclease from Bos taurus (124 
residues) , ribonuclease from Aspergillus orvzae (104 

20 residues) , hen egg white lysozyme from Gallus crallus 
(129 residues), azurin from Pseudomonas aeruginosa (128 
residues) , or T4 lysozyme (164 residues) , because such 
proteins have clefts and grooves into which the small 
target molecules can fit. The Brookhaven Protein Data 

25 Bank contains 3D structures for these proteins. Genes 
encoding proteins as large as T4 lysozyme can be 
manipulated by standard techniques for the purposes of 
this invention. 

30 If the target is a mineral, insoluble in water, 

one must consider the nature of the mineral's molecular 
surface. Smooth surfaces, (such as crystalline 
silicon) require medium to large proteins (such as 
ribonuclease) as IPBD in order to have sufficient 

35 contact area and specificity. Rough, grooved surfaces 
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(zeolites) , could be bound either by small proteins 
(BPTI) or larger proteins (T4 lysozyme) . 

sec. 2.2: Influence of t arget charge on choice of 
IPBD; 

Electrostatic repulsion between molecules of like 
charge can prevent molecules with highly complementary 
surfaces from binding. Therefore, it is preferred 
that, under the conditions of intended use, the IPBD 
and the target molecule either have opposite charge or 
that one of them is neutral. Inclusion of counter ions 
can reduce or eliminate electrostatic repulsion. 

Sec. 2.3: Other aspects of choice of IPBD: 

If the chosen IPBD is an enzyme, it may be 
necessary to change one or more residues in the active 
site to inactivate enzyme function. For example, if 
the IPBD were T4 lysozyme and the GP were E*. coli cells 
or M13, we would inactivate the lysozyme lest it lyse 
the cells. If, on the other hand, the GP were PhiX174, 
then inactivation of lysozyme may not be needed because 
T4 lysozyme can be overproduced inside coli cells 
without detrimental effects and PhiX174 forms 
intracellular ly. It is preferred to inactivate enzyme 
IPBDs that might be harmful to the GP or its host by 
substituting mutant amino acids at one or more residues 
of the active site. It is permitted to vary one or 
more of the residues that were changed to abolish the 
original enzymatic activity of the IPBD. Those GPs 
that receive osp-pbd genes encoding an active enzyme 
may die, but the majority of sequences will not be 
deleterious . 
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Sec. 3: Choice of OCV: 

The OCV is preferably small, e.g., less than 10 
KB. It is desirable that cassette mutagenesis be 
5 practical in the OCV; preferably, at least 25 
restriction enzymes are available that do not cut the 
OCV. It is likewise desirable that single-stranded 
mutagenesis be practical. Finally, the OCV preferably 
carries a selectable marker. A suitable OCV is 

10 obtained or is engineered by manipulation of available 
vectors. Plasmids are preferred over the bacterial 
chromosome because genes on plasmids are much more 
easily constructed and mutated than are chromosomal 
genes. When bacteriophage are to be used, the osp-ipbd 

15 gene must be inserted into the phage genome. 

For phage such as M13, an antibiotic resistance 
gene is engineered into the genome (HINE80) . More 
virulent phage, such as PhiX174, make discernable 

20 plaques that can be picked, in which case a resistance 
gene is not essential; furthermore, there is no room in 
the PhiX174 virion to add any new genetic material. 
Inability to include an antibiotic resistance gene is a 
disadvantage because it limits the number of GPs that 

25 can be screened. 

It is preferred that GP(IPBD) carry a selectable 
marker not carried by wtGP. It is also preferred that 
wtGP carry a selectable marker not carried by GP(IPBD). 

30 

Sec. 4: Designing the osp-ipbd gene insert: 

We design an amino acid sequence that will cause 
35 the IPBD to appear on the GP surface when it is 
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expressed. This amino acid sequence may determine the 
entire coding region of the osp-ipbd gene, or it may 
contain only the ipbd sequence adjoining restriction 
sites into which random DNA will be cloned (Sec. 6.2). 

The actual gene may be produced by any means. The 
pbd segment, derived from the ipbd segment, must be 
easily genetically manipulated in the ways described in 
Part III. Synthetic ipbd segments are preferred 
because they allow greatest control over placement of 
restriction sites. 

Sec. 4.1 Genetic regulation of t he osn-ipbd gene: 

Regarding regulation of the osp-ipbd gene, the two 
important questions are: a) how much OSP-IPBD do we 
need on each GP, and b) how accurately must we regulate 
the amount? 

The essential function of the affinity separation 
is to separate GPs that bear PBDs (derived from IPBD) 
having high affinity for the target from GPs bearing 
PBDs having low affinity for the target. If a gradient 
of some solute, such as increasing salt, changes the 
conditions, then all weakly-binding PBDs will cease to 
bind before any strongly-binding PBDs cease to bind. 
Regulation of the osp-pbd gene must be such that all 
packages display sufficient PBD to effect a good 
separation in Sec 15. If the amount of PBD/GP had an 
effect on the elution volume of the GP from the 
affinity matrix, then we would need to regulate the 
amount of PBD/GP accurately. The following analysis 
shows that there is no strong linear effect of IPBD/GP 
on elution volume and assumes only: a) that all GPs are 
the same size, b) that interactions between the PBDs 
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and the affinity matrix dominate differential elution 
of GPs, c) that the system is at equilibrium, and d) 
that all PBDs on any one GP are identical. 

5 If Np identical PBDs on a GP each have access to 

target molecules, and each PBD has a free-energy of 
binding to the target of delta G b , then the total free 
energy of binding is 

10 delta G b tot = N p * delta G b . 

Delta G b is a function of parameters of the solvent, 
such as: 1) concentration of ions, 2) pH, 3) 
temperature, 4) concentration of neutral solutes such 

15 as sucrose, glucose, ethanol, etc, . 5) specific ions, 
such as, calcium, acetate, benzoate, nicotinate, etc. 
If conditions are altered during affinity separation so 
that delta G^ approaches zero, delta Gk tot approaches 
zero Np times faster. As delta G b tot goes to or above 

20 zero, the packages will dissociate from the immobilized 
target molecules and be eluted. 

GPs bearing more PBDs have a sharper transition 
between bound and unbound than packages with fewer of 
25 the same PBDs. For equilibrium conditions, the mid- 
point of the transition is determined only by the 
solution conditions that bring the individual 
interactions to zero free-energy. The number of 
PBDs/GP determines the sharpness of the transition. 

30 

It should also be noted that the number of PBDs/GP 
is usually influenced by physiological conditions so 
that a sample of genetically identical GP(PBD)s may 
contain GPs having different numbers of PBDs on the GP 
35 surface. In a population of GP(vgPBD)s each PBD 
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sequence will appear on more that one GP, and the 
actual number of PBDs/GP will vary from GP to GP within 
some range. Within a variegated population of PBDs, 
let PBD X be the PBD with maximum affinity for the 
5 target. If there is a linear effect on elution volume 
of number of PBDs/GP, then the GPs having the greatest 
number of PBD X will be most retarded on the column. 
When we culture the enriched population the GP(PBD X ) 
• will be amplified and give rise to new GP(PBD x )s having 

10 varying numbers of PBDx/GP. Thus the affinity 
separation process of the present invention could 
tolerate a linear effect of number of PBDs/GP on the 
elution volume of the GP(PBD) unless strong binding to 
target fortuitously causes the PBD to be displayed on 

15 the GP only in low number. 

Since there is no linear effect on elution volume 
from the number of IPBDs/GP, need for highly accurate 
regulation of IPBD/GP is not anticipated. Reproducible 

20 gene expression is more easily controlled using 
regulated rather than constitutive genetic elements. 
The analysis above assumes that GP(IPBD)s are in 
equilibrium between solution in buffer and bound to the 
affinity matrix. Rate of elution may be an important 

25 parameter in column affinity chromatography. In batch 
elution from an affinity matrix or elution from an 
affinity plate, the time that each buffer is in contact 
with the affinity material may be an important 
variable. The density of affinity molecules on the 

30 matrix is an important variable in optimizing the 
affinity separation. Because the analysis above is 
qualitative, in Sec. 10 of the preferred embodiment we 
experimentally optimize: 1) the density of IPBD on the 
GP surface, 2) the density of affinity molecules on the 

35 affinity matrix, 3) the initial ionic strength, 4) the 
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elution rate, and 5) the quantity of GP/ (volume of 
matrix) to be loaded on the column. 

Transcriptional regulation of gene expression is 
best understood and most effective, so we focus our 
attention on the promoter. A number of promoters are 
known that can be controlled by specific chemicals 
added to the culture medium. For example, the lacUVS 
promoter is induced if isopropylthiogalactoside is 
added to the culture medium, for example, at between 
1.0 uM and 10.0 mM. Hereinafter, we use "XINDUCE" as a 
generic term for a chemical that induces expression of 
a gene. If transcription of the osp-ipbd gene is 
controlled by XINDUCE, then the number of OSP-IPBDs per 
GP increases for increasing concentrations of XINDUCE 
until a fall-off in the number of viable packages is 
observed or until sufficient IPBD is observed on the 
surface of harvested GP(IPBD)s. 

The attributes that affect the maximum number of 
OSP-IPBDs per GP are primarily structural in nature. 
There may be steric hindrance or other unwanted 
interactions between IPBDs if OSP-IPBD is substituted 
for every wild-type OSP. Excessive levels of OSP-IPBD 
may also adversely affect the solubility or 
morphogenesis of the GP. For cellular and viral GPs, 
as few as five copies of a protein having affinity for 
another immobilized molecule have resulted in 
successful affinity separations (FERE82a, FERE82b, and 
SMIT85) . 

Another consideration of promoter regulation is 
that it is useful later to know the range of regulation 
of the osp-ipbd . (Sec. 8) In particular, one should 
determine how nearly the absence of XINDUCE leads to 
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the absence of IPBD on the GP surface; a non-leaky 
promoter is preferred. Non-leakiness is useful: a) to 
show that affinity of GPf osp-ipbd ) s for AfM(IPBD) is 
due to the osp-ipbd gene, and b) to allow growth of 
5 GP( osp-pbd ) in the absence of XINDUCE if the expression 
of osp-pbd is disadvantageous. The lacUV5 promoter in 
conjunction with the LaclS repressor is a preferred 
example. 

10 Sec. 4.2: DNA sequence design: 

The present invention is not limited to a single 
method of gene design. The following procedure is an 
example of one method of gene design that fills the 
15 needs of the present invention. 

If the amino-acid sequence of OSP-IPBD is a 
definite sequence, then the entire gene will be 
constructed (Sec. 6.1). If random DNA is to be fused 
20 to ipbd . then a "display probe" is constructed first; 
the random DNA is then inserted to complete the 
population of putative osp-ipbd genes (Sec. 6.2) from 
which a functional osp-ipbd gene is identified by in 
vivo selection or kindred techniques. 

25 

One may use any genetic engineering method to 
produce the correct gene fusion, so long as one can 
easily and accurately direct mutations to specific 
sites in the pbd DNA subsequence (Sec. 14.1). For the 

30 methods of mutagenesis considered here, however, the 
DNA sequence for the osp-ipbd gene must be different 
from any other DNA in the OCV. The degree and nature 
of difference needed is determined by the method of 
mutagenesis. One replaces subsequences coding for the 

35 PBD with vgDNA, then subsequences to be mutagenized 



WO 90/02809 



PCT/US89/03731 



60 

must, be bounded by restriction sites that are unique 
within the OCV. If single-stranded-oligonucleotide- 
directed mutagenesis is to be used, then the DNA 
sequence of the subsequence coding for the IPBD must be 
5 unique within the OCV. 

Regulatory elements include: a) promoters, b) 
Shine-Dalgarno sequences, and c) transcriptional 
terminators, and may be isolated from nature or 
10 designed from knowledge of consensus sequences of 
natural regulatory regions. 

The coding portions of genes to be synthesized are 
designed at the protein level and then encoded in DNA. 

15 The amino acid sequences are chosen to achieve various 
goals, including: . a) display of a IPBD on the surface 
of a GP, b) change of charge on a IPBD, and c) 
generation of a population of PBDs from which to select 
an SBD. The ambiguity in the genetic code is exploited 

20 to allow optimal placement of restriction sites and to 
create various distributions of amino acids at 
variegated codons. 

Sec. 4.3: Specific DNA sequence assignment; 

25 

A computer program may be used to identify all 
possible ambiguous DNA sequences coding for an amino- 
acid sequence given by the user and to identify places 
where recognition sites for site-specific restriction 
30 enzymes could be provided without altering the amino- 
acid sequence. 

Restriction sites are positioned within the osp- 
ipbd gene so that the longest segment between sites is 
35 as short as possible. Enzymes the produce cohesive 
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ends are preferred. The codon preferences of the 
intended host and the secondary structure of the 
messenger KNA are also considered. 

5 fi^c. 5.1: Organiz ation of gene synthesis: 

An established strategy for gene synthesis is to 
synthesize both strands of the entire gene in 
overlapping segments of 20 to 50 nucleotides (nts) 

10 (THER88) . We prefer an alternative method that is more 
suitable for synthesis of vgDNA. Our method differs 
from previous methods (OLIP86, OLIP87, AUSU87) in that 
we: a) use two synthetic strands, and b) do not cut the 
extended DNA in the middle. Our goals are: a) to 

15 produce longer pieces of dsDNA than can be synthesized 
as ssDNA on commercial DNA synthesizers, and b) to 
produce strands complementary to single-stranded vgDNA. 
By using two synthetic strands, we remove the 
requirement for a palindromic sequence at the 3' end. 

20 

DNA synthesizers can produce oligo-nts of up to 
100 nts in reasonable yield, M DNA = 100. The 
parameters N w (the length of overlap needed to obtain 
efficient annealing) and N s (the number of spacer bases 
25 needed so that a restriction enzyme can cut near the 
end of blunt-ended dsDNA) are determined by DNA and 
enzyme chemistry. N w = 10 and N s = 5 are reasonable 
values . 

30 We divide the DNA sequence to be synthesized into 

two nearly equal parts, each 5-8 bases longer than half 
the total length, so that there is an overlap between 
the two parts of 10 to 16 bp (Nw) containing no 
variegated bases. The overlap preferably, is not 

35 palindromic and has high GC content. We synthesize the 
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overlap portion and the 5' extension of each strand. 
When these strands are annealed and completed with 
Klenow enzyme and all four NTPs, we obtain the desired 
sequence as blunt-ended dsDNA. If the DNA is to be 
5 ligated to other DNA having cohesive ends, five to ten 
(Ns) bases are added to that end. The synthetic dsDNA 
can then be cut efficiently with an appropriate 
restriction enzyme (OLIP87) . 

10 Because M DNA is not rigidly fixed at 100, the 

current limits of 190 (=2 M DNA - N w ) nts overall and 
100 in each fragment are not rigid, but can be exceeded 
by 5 or 10 nts. Going beyond the limits of 190 and 100 
will lead to lower yields, but these may be acceptable 

15 in certain cases. 

Sec. 5.2: DNA synth esis and purification methods : 

The present invention is not limited to any 
20 particular method of DNA synthesis or construction. 

In the preferred embodiment, DNA is synthesized by 
standard means on a Milligen 7500 DNA synthesizer. The 
Milligen 7500 has seven vials from which 

25 phosphoramidites may be taken. Normally, the first 
four contain A, C, T, and G. The other three vials 
may contain unusual bases such as inosine or mixtures 
of bases, the so-called "dirty bottle". The standard 
software allows programmed mixing of two, three, or 

30 four bases in equimolar quantities. 

The present invention is not limited to any 
particular method of purifying DNA for genetic 
engineering. Agarose gel electrophoresis and 
35 electroelution on an IBI device (International 



WO 90/02809 



PCT/US89/03731 



63 

Biotechnologies, Inc., New Haven, CT) is, preferably, 
used to purify large dsDNA fragments. For oligo-nts, 
PAGE and electroelution with an Epigene device (Epigene 
Corp., Baltimore, MD) are an alternative to HPLC. 

5 

sac. 6.1: Cloning of Known OSP-iobd ge ne into OCV; 

In the preferred method, the synthetic gene is 
constructed using plasmids that are transformed into 

10 bacterial cells by standard methods (MANI82, p250) or 
slightly modified standard methods. Alternatively, DNA 
fragments derived from nature are operably linked to 
other fragments of DNA derived from nature or to 
synthetic DNA fragments. In most cases of the 

15 preferred method, gene synthesis involves construction 
of a series of plasmids containing larger and larger 
segments of the complete gene. 

Sec. 6.2 Cloning of Random DNA ( Pote ntial oso) Into 
20 Display Probe; 

If random DNA and phenotypic selection or 
screening are used to obtain a GP(IPBD) , then we clone 
random DNA into one of the restriction sites that was 
25 designed into the display probe. 

The random DNA may be obtained in a variety of 
ways. Degenerate synthetic DNA is one possibility. 
Alternatively, pseudorandom DNA may be taken from 

30 nature. If, for example, an SEk * site (GCATG/C) has 
been designed into the display probe at one end of the 
jpbd fragment, then we would use Nla III (CATG/) to 
partially digest DNA that contains a wide variety of 
sequences, generating a wide variety of fragments with 

35 CATG 3' overhangs. Preferably, the display probe has 
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different restriction sites at each end of the ipbd 
gene so that random DNA can be cloned at either end. 

A plasmid carrying the display probe is digested 
5 with the appropriate restriction enzyme and the 
fragmented, random DNA is annealed and ligated by 
standard methods. The ligated plasmids are used to 
transform cells that are grown and selected for 
expression of the antibiotic-resistance gene. Plasmid- 
10 bearing GPs are then selected for the display-of-IPBD 
phenotype by the procedure given in Sec. 15 of the 
. present invention using AfM(IPBD) as if it were the 
target. Sec. 15 is designed to isolate GP(PBD)s that 
bind to a target from a large population that do not 
15 bind. 

Sec. 7: Harvest of GPs t 

Cells are transformed with ligated OCVs and 
20 selected for uptake of OCV after an appropriate 
incubation with an agent appropriate to the selectable 
markers on the OCV. GPs are harvested by methods 
appropriate to the GP at hand, generally, 
centrifugation to pelletize GPs and resuspension of the 
25 pellets in sterile medium (cells) or buffer (spores or 
phage) ♦ 

Sec. 8: Verification of Display Strategy: 

30 The harvested packages are now tested for display 

of IPBD on the surface; any ions or cof actors known to 
be essential for the stability of IPBD or 'AfM(IPBD) 
must be included at appropriate levels. The tests can 
be done: a) by affinity labeling, b) enzymatically, c) 

35 spectrophotometrically, d) by affinity separation, or 
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e) by affinity precipitation. The AfM(IPBD) in this 
step is one picked to have strong affinity 
(preferably, % < 10" 11 M) for the IPBD molecule and 
little or no affinity for the wtGP. For example, if 
5 BPTI were the IPBD, trypsin, anhydrotrypsin, or 
antibodies to BPTI could be used as the AfM(BPTI) to 
test for the presence of BPTI. Anhydrotrypsin, a 
trypsin derivative with serine 195 converted to 
dehydroalanine, has no proteolytic activity but retains 
10 its affinity for BPTI (AKOH72 and HUBE77). 

Preferably, the presence of the IPBD on the 
surface of the GP is demonstrated through the use of a 
soluble, labeled derivative of a AfM(IPBD) with high 
15 affinity for IPBD. The labeled derivative of AfM(IPBD) 
is denoted as AfM(IPBD) *. 

If random DNA has been used, then the procedures 
of Sec. 15 are used to obtain a clonal isolate that has 
20 the display-of-IPBD phenotype. Alternatively, clonal 
isolates may be screened for the display-of-IPBD 
phenotype. The tests of this step are applied to one 
or more of these clonal isolates. 

25 If no isolates that bind to the affinity molecule 

are obtained we take corrective action as disclosed in 
Sec . 9 . 

If one or more of the tests indicates that the 
30 IPBD is displayed on the GP surface, we verify that the 
binding of molecules having known affinity for IPBD is 
due to the chimeric osp-iobd gene through the use of 
standard genetic and biochemical techniques, such as: 
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1) transferring the osp-inbd gene into the parent 
GP to verify that osp-ipbd confers binding, 

2) deleting the osp-ipbd gene from the isolated GP 
to verify that loss of osp-ipbd causes loss of 
binding, 

3) showing that binding of GPs to AfM(IPBD) 
correlates with [XINDDCE] (in those cases that 
expression of osp-ipbd is controlled by 
[XINDUCE]) , and 

4) showing that binding of GPs to AfM(IPBD) is 
specific to the immobilized AfM(IPBD) and not to 
the support matrix. 

Presence of IPBD on the GP surface is indicated by 
a strong correlation between [XINDUCE] and the 
reactions that are linear in the amount of IPBD (such 
as: a) binding of GPs by soluble AfM(IPBD) *, b) 
absorption caused by IPBD, and c) biochemical reactions 
of IPBD). The demonstration (4) that binding is to 
AfM(IPBD) and the genetic tests (1) and (2) are 
important; the test with XINDUCE (3) is less so. 

We sequence the relevant ipbd gene fragment from 
each of several clonal isolates to determine the 
construction. 



We establish the maximum salt concentration and pH 
range for which the GP(IPBD) binds the chosen 
AfM(IPBD) 4 
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If the IPBD is displayed on the outside of the GP, 
and if that display is clearly caused by the introduced 
osp- ipbd gene, we proceed to Part II, otherwise we must 
analyze the result and adopt appropriate corrective 
5 measures . 

Sec. 9: Perfecting the Disp lay System: 

If we have attempted to fuse an ipbd fragment to a 
10 natural osp fragment, our options are : 

1) pick a different fusion to the same osp by 

a) using opposite end of osp, 

b) keeping more or fewer residues from osp in 
15 the fusion; for example, in increments of 3 

or 4 residues, 

c) trying a known or predicted domain 
boundary, 

d) trying a predicted loop or turn position, 

20 

2) pick a different ose, or 

3) switch to random DNA method. 

25 If we have just tried the random DNA method 

unsuccessfully, our options are : 

1) choose a different relationship between ipbd 
fragment and random DNA ( ipbd first., random DNA 

30 second or vice versa ) , 

2) try a different degree of partial digestion, a 
different enzyme for partial digestion, a 
different degree of shearing or a different source 

35 of natural DNA, or 
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3) switch to the natural OSP method. 

If all reasonable OSPs of the current GP have been 
5 tried and the random DNA method has been tried, both 
without success, we pick a new GP. 

Part II 

10 Sec. 10.0; Affinity Separation Means: 

In Part II we optimize an affinity separation 
system that will be used in Part III to enrich a 
population of GP(vgPBD)s for those GP(PBD)s that 
15 display PBDs with increased affinity for the target. 

Affinity chromatography is the preferred means, 
but FACS, electrophoresis, or other means may also be 
used. 

20 

Sec. 10.1; Optimization of Affinity Chromatography 
Separation; 

Changes in eluant concentration cause GPs to elute 
25 from the column. Elution volume, however, is more 
easily measured and specified. It is to be understood 
that the eluant concentration is the agent causing GP 
release and that an eluant concentration can be 
calculated from an elution volume and the specified 
30 gradient. 

Using a specified elution tegime, we compare the 
elution volumes of GP(IPBD)s with the elution volumes 
of wtGP on affinity columns supporting AfM(IPBD) . 
35 Comparisons are made at various: a) amounts of IPBD/GP, 
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b) densities of AfM (IPBD) / (volume of matrix) (DoAMoM) , 

c) initial ionic strengths, d) elution rates, e) 
amounts of GP/ (volume of support), f) pHs, and g) 
temperatures, because these are the parameters most 

5 likely to affect the sensitivity and efficiency of the 
separation. We then pick those conditions giving the 
best separation. 

We do not optimize pH or temperature; rather we 
10 record optimal values for the other parameters for one 
or more values of pH and temperature. The conditions 
of intended use, specified by the user (Sec. 11), may 
include a specification of pH or temperature. If pH is 
specified, then pH will not be varied in eluting the 
15 column (Sec. 15.3). Decreasing pH may be used to 
liberate bound GPs from the matrix. If the intended 
use specifies a temperature, we will hold the affinity 
column at the specified temperature during elution, but 
we might vary the temperature during recovery. 

20 

The AFM (IPBD) is preferably one known to have 
moderate affinity for the IPBD (Kd in the range 10~ 6 M 
to 10" 8 M) . When populations of GP(vgPBD)s are 
fractionated, there will be roughly three 

25 subpopulations; a) those with no binding, b) those that 
have some binding but can be washed off with high salt 
or low pH, and c) those that bind very tightly and must 
be rescued in situ . We optimize the parameters to 
separate (a) from (b) rather than (b) from (c) . Let 

30 PBDv be a PBD having weak binding to the target and 
PBD S be a PBD having strong binding. Higher DoAMoM 
might, 'for example, favor retention of GP(PBD W ) but 
also make it very difficult to elute viable GP(PBD S ) . 
We will optimize the affinity separation to retain 
. 35 GP(PBD W ) rather than to allow release of GP(PBD S ) 
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because a tightly bound GP(PBD S ) can be rescued by in 
situ growth. If we find that DoAMoM strongly affects 
the elution volume, then in part III we may reduce the 
amount of target on the affinity column when an SBD has 
5 been found with moderately strong affinity (Ka on the 
order of 10" 7 M) for the target. 

In this step, we measure elution volumes of 
genetically pure GPs that elute from the affinity 
10 matrix as sharp bands that can be detected by UV 
absorption. Samples from effluent fractions are plated 
on suitable medium (cells or spores) or on sensitive 
cells (phage) and colonies or plaques counted. 

15 Several values of IPBD/GP, DoAMoM, elution rates, 

initial ionic strengths, and loadings should be 
examined. We anticipate that optimal values of IPBD/GP 
and DoAMoM will be correlated and therefore should be 
optimized together. The effects of initial ionic 

20 strength, elution rate, and amount of GP/ (matrix 
volume) are unlikely to be strongly correlated, and so 
they can be optimized independently. 

For each set of parameters to be tested, the 
25 column is eluted in a specified manner. For example, 
we may use a regime called Elution Regime 1: a KC1 
gradient runs from lOmM to maximum allowed for the 
GP(IPBD) viability in 100 fractions of 0.05 V v (void 
volume) , followed by 20 fractions of 0.05 V v at maximum 
30 allowed KC1; pH of the buffer is maintained at the 
specified value with a convenient buffer such as Tris. 
It is important that the conditions of this 
optimization be similar to the conditions that are used 
in Part III for selection for binding to target (Sec. 
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15.3) and recovery of GPs from the chromatographic 
system (Sec. 15.4). 

When the oso-ipbd gene is regulated by [XINDUCE], 
5 IPBD/GP can be controlled by varying [XINDUCE] . 
Appropriate values of [XINDUCE] depend on the identity 
of [XINDUCE] and the promoter; if, for example, XINDUCE 
is isopropylthiogalactoside (IPTG) and the promoter is 
lacUVS . then [IPTG] = 0, 0,1 uM, 1.0 uM, 10.0 uM, 100.0 
10 uM, and 1.0 mM are appropriate levels to test. The 
range of variation of [XINDUCE] is extended until an 
optimum is found or an acceptable level of expression 
is obtained. 

15 DoAMoM is varied from the maximum that the matrix 

material can bind to 1% or 0.1% of this level in 
appropriate steps. We anticipate that the efficiency 
of separation will be a smooth function of DoAMoM so 
that it is appropriate to cover a wide range of values 

20 for DoAMoM with a coarse grid and then explore the 
neighborhood of the approximate optimum with a finer 
grid. 

Several values of initial ionic strength are 
25 tested, such as 1.0 mM, 5.0 mM, 10.0 mM and 20.0 mM. 

The elution rate is varied, by successive factors 
of 1/2, from the maximum attainable rate to 1/16 of 
this value. The fastest elution rate giving the good 
30 separation is optimal. 

The goal of the optimization is to obtain a sharp 
transition between bound and unbound GPs, triggered by 
increasing salt or decreasing pH or a combination of 
35 both. This optimization need be performed only: a) for 
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each temperature to be used, b) for each pH to be used, 
and c) when a new GP(IPBD) is created. 



Regulatable promoters are available for all 
5 genetic packages except, possibly, bacterial spores. A 
promoter functional in bacterial spores might be 
prepared by constructing a hybrid of a sporulation 
promoter and a regulatable bacterial promoter (e.g., 
lac ) , or by saturation mutagenesis of a sporulation 

10 promoter followed by screening for regulatable promoter 
activity (cf. OLIP86, 0LIP87) . When the promoter of 
the osp-ipbd gene is not regulatable, we optimize 
DoAMoM, the elution rate, and the amount of GP/ volume 
of matrix. If the optimized affinity separation is not 

15 acceptable, we must develop a means to alter the amount 
of IPBD per GP. 

Sec. 10.2: Measuring the sensitivity of affinity 

separation: 

20 

We determine the sensitivity of the affinity 
separation (C sens i) by measuring the minimum quantity 
of GP(IPBD) that can be detected in the presence of a 

25 large excess of wtGP. The user chooses a number of 
separation cycles, denoted N^rom' that will be 
performed before an enrichment is abandoned; 
preferably, N chrom is in the range 6 to 10 and N chrom 
must be greater than 4. Enrichment can be terminated 

30 by isolation of a desired GP(SBD) before N C hrom passes. 

The measurement of sensitivity is significantly 
expedited if GP(IPBD) and wtGP carry different 
selectable markers. 



35 
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Mixtures of GP(IPBD) and wtGP are prepared in the 
ratios of l:V lim , where Vn m ranges by an appropriate 
factor ( e.g. 1/10) over an appropriate range, typically 
10 11 through 10 4 . Large values of Vii m are tested 
first; once a positive result is obtained for one value 
of Vij^ no smaller values of need be tested. 

Each mixture is applied to a column supporting, at the 
optimal DoAMoM, an AfM(IPBD) having high affinity for 
IPBD and the column is eluted by the specified elution 
regime. The last fraction that contains viable GPs and 
an inoculum of the column matrix material are cultured. 
If GP(IPBD) and wtGP have different selectable markers, 
then transfer onto selection plates identifies each 
colony. Otherwise, a number ( e.g. 32) of GP clonal 
isolates are tested for presence of IPBD by the 
techniques discussed in Sec. 8. 

If IPBD is not detected on the surface of any of 
the isolated GPs, then GPs; are pooled from: a) the last 
few ( e.g. 3 to 5) fractions that contain viable GPs, 
and b) an inoculum taken from the column matrix. The 
pooled GPs are cultured and passed over the same column 
and enriched for GP(IPBD) in the manner described. 
This process is repeated until N^rom passes have been 
performed, or until the IPBD has been detected on the 
GPs. If GP(IPBD) is not detected after N chrom passes, 
v lim ^ s decreased and the process is repeated. 

c sensi equals the highest value of V±i m for which 
the user can recover GP(IPBD) within N chroin passes. 
The number of chromatographic cycles (K C y C ) that were 
needed to isolate GP(IPBD) gives a rough estimate of 
c eff ' c eff is approximately the K cyc th root of Vlim: 



c eff - (approx.) exp( log^V^^/Kcyc ) 
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For example, if were 4.0 x 10 8 and three 

separation cycles were needed to isolate GP(IPBD) , then 
c eff = (approx.) 736. 

5 

Sec, 10.3: Measuring the efficiency of separation : 

To determine C e ff more accurately , we determine 
the ratio of GP(IPBD)/wtGP loaded onto an AfM(IPBD) 
10 column that yields approximately equal amounts of 
GP(IPBD) and wtGP after elution. 

Sec. 10.4: Other Separation Means 

15 Other separation means are optimized in a manner 

parallel to the used for affinity chromatography. 

FACS (e.g. FACStar from Beckton-Dickinson, 
Mountain View, CA) is most appropriate for bacterial 

20 cells and spores because the sensitivity of the 
machines requires approximately 1000 molecules of 
fluorescent label bound to each GP to accomplish a 
separation. To optimize FACS separation of GPs, we use 
a derivative of Afm(IPBD) that is labeled with a 

25 fluorescent molecule, denoted Afm (IPBD)*. The 
variables that must be optimized include: a) amount of 
IPBD/GP, b) concentration of Afm (IPBD)*, c) ionic 
strength, d) concentration of GPs, and e) parameters 
pertaining to operation of the FACS machine. Because 

30 Afm (IPBD)* and GPs interact in solution, the binding 
will be linear in both [Afm (IPBD)*] and [displayed 
IPBD] . Preferably, these two parameters are varied 
together. The other parameters can be optimized 
independently . 

35 
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Electrophoresis is most appropriate to 
bacteriophage because of their small size (SERW87) . 
Electrophoresis is a preferred separation means if the 
target is so small that chemically attaching it to a 
5 column or to a fluorescent label would essentially 
change the entire target. For example, chloroacetate 
ions contain only seven atoms and would be essentially 
altered by any linkage. GPs that bind chloroacetate 
would become more negatively charged than GPs that do 
10 not bind the ion and so these classes of GPs could be 
separated. 



The parameters to optimize for electrophoresis 
include: a) IPBD/GP, b) concentration of gel material, 

15 e.g. agarose, c) concentration of Afm (IPBD) , d) ionic 
strength, e) size, shape, and cooling capacity of the 
electrophoresis apparatus, f) voltages and currents, 
and f) concentration of GPs. Preferably, IPBD/GP and 
[Afm (IPBD) ] are varied at the same time and other 

20 parameters are optimized independently. 



Part III 



Sec. 11.0: Choice of target material : 

25 

Any material may be chosen as target material, 
subject only to the following restrictions: 

If affinity chromatography is to be used, then: 

30 

1) the molecules of the target material must be of 
sufficient size and chemical reactivity to be 
applied to a solid support suitable for affinity 
separation, 



35 
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2) after application to a matrix, the target 
material must not react with water, 

3) after application to a matrix, the target 

5 material must not bind or degrade proteins in a 

non-specific way, and 

4) the molecules of the target material must be 
sufficiently large that attaching the material to 

10 a matrix allows enough unaltered surface area 

(generally at least 500 8 2 , excluding the atom 
that is connected to the linker) for protein 
binding. 

15 If FACS is to be used as the affinity separation 

means, then: 

1) the molecules of the target material must be of 
sufficient size and chemical reactivity to be 

20 conjugated to a suitable fluorescent dye or the 

target must itself be fluorescent, 

2) after any necessary fluorescent labeling, the 
target must not react with water, 

25 

3) after any necessary fluorescent labeling, the 
target material must not bind or degrade proteins 
in a non-specific way, and 

30 4) the molecules of the target material must be 

sufficiently large that * attaching the material to 
a suitable dye allows enough unaltered surface 
area (generally at least 500 A 2 , excluding the 
atom that is connected to the linker) for protein 

35 binding. 
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If affinity electrophoresis is to be used, then: 

1) the target must either be charged or of such a 
5 nature that its binding to a protein will change 

the charge of the protein, 

2) the target material must not react with water, 



10 



15 



20 



3) the target material must not bind or degrade 
proteins in a non-specific way, and 

4) the target must be compatible with a suitable 
gel material. 



Possible target materials include, but are not 
limited to: a) soluble proteins (such as horse heart 
myoglobin, human neutrophil elastase, activated (blood 
clotting) factor X, alpha-f etoprotein, alpha 
interferon, melittin, Bordetella pertussis adenylate 
cyclase toxin, any retroviral pol protease or any 
retroviral gag protease) , b) lipoproteins (such as 
human low density lipoprotein) , c) glycoproteins (such 
as a monoclonal antibody) , d) lipopolysaccharides (such 
25 as O-antigen of salmonella enteritidis) , e) nucleic 
acids (such as tRNAs, ribosomal RNAs, messenger RNAs 
dsDNA or ssDNA, possibly with sequence specificity) ; f ) 
soluble organic molecules (such as cholesterol , 
aspartame, bilirubin, morphine, codeine, 
30 dichlorodiphenyltrichlorethane (DDT), benzo (a)pyrene, 
prostaglandin PGE2, protoporphyrin IX, or actinomycin 
D) , g) organometallic complexes (such as iron haem or 
cobolt haem), h) organic polymers (such as cellulose or 
chitin) , i) insoluble minerals (such as asbestos, 
35 zeolites, or hydroxylapatite) , j) viral and phage coat 
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proteins (such as influenza haemaggutinin or phage 
lambda capsid) , and k) bacterial membrane or outer 
membrane proteins (such as LamB from E. coli or 
flagella proteins) . 

5 

A supply of several milligrams of pure target 
material is desired. Impure target material could be 
used, but one might obtain a protein that binds to a 
contaminant instead of to the target. 

10 

The following information about the target 
material is highly desirable: 

1) stability as a function of temperature, pH, and 
15 .ionic strength, 

2) stability with respect to chaotropes such as 
urea or guanidinium CI, 

20 3) pi, 

4) molecular weight, 

5) requirements for prosthetic groups or ions, 
25 such as ha em or Ca +2 , and 

6) proteolytic activity, if any. 

In addition to this most desirable information, it 
30 is useful to know: 1) the target's sequence, if the 
target is a macromolecule, 2) the 3D structure of the 
target, 3) enzymatic activity, if any, and 4) toxicity, 
if any. 
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The user of the present invention specifies 
certain parameters of the intended use of the binding 
protein: 

5 1) the acceptable temperature range, 

2) the acceptable pH range , 

3) the acceptable concentrations of ions and 
10 neutral solutes, 

4) the maximum acceptable dissociation constant 
for the target and the SBD: 

15 

K T - [Target] [SBD]/ [Target: SBD] 

In some cases, the user may require discrimination 
between T, the target, and N, some non-target. Let 

Kt - [T] [SBD]/[T:SBD] , and 
% = [N] [SBD]/ [N: SBD] , 

then K T /K N = ( [T] [N:SBD] )/ ( [N] [T:SBD] ) . 

The user then specifies a maximum acceptable value for 
the ratio Rp/%. 

If the target material is a general protease, one 
30 must consider the following points: 

1) a highly specific protease can be treated like 
any other target, 



20 



25 



35 



2) a general protease, such as subtilisin, may 
degrade the OSPs of the GP including OSP-PBDs; 
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there are several alternative ways of dealing with 
general proteases, including: a) a chemical 
inhibitor may be used to prevent proteolysis f e.q. 
phenylmethylfluorosulfate (PMFS) that inhibits 
5 serine proteases) , b) one or more active-site 

residues may be mutated to create an inactive 
protein f e.q, a serine protease in which the 
active serine is mutated to alanine) , or c) one or 
more active-site amino-acids of the protein may be 
10 chemically modified to destroy the catalytic 

activity f e.q, a serine protease in which the 
active serine is converted to anhydroserine) , 



3) SBDs selected for binding to a protease need 
15 not be inhibitors; SBDs that happen to inhibit 

the protease target are a fairly small subset of 
SBDs that bind to the protease target, 



4) the more we modify the target protease, the 
20 less like we are to obtain an SBD that inhibits 

the target protease, and 



5) if the user requires that the SBD inhibit the 
target protease, then the active site of the 

25 target protease must not be modified any more than 

necessary? inactivation by mutation or chemical 
modification are preferred methods of inactivation 
and a protein protease inhibitor becomes a prime 
candidate for IPBD. For example, BPTI could be 

30 mutated, by the methods of the present invention, 

to bind to proteases other than trypsin (TANK77 
and TSCH87) . 



35 
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sec. 12 .0: C hoice of GPdPBDl : 



The user must pick a GP(IPBD) that is suitable to 
the chosen target according to the criteria of Sec. 2. 
5 It is anticipated that a small collection of a 
* GP(IPBD)s can be assembled such that, for any chosen 
target, at least one member of the collection will be a 
suitable starting point for engineering a protein that 
binds to the chosen target by the methods of the 
10 present invention. The user should optimize the 
affinity separation for conditions appropriate to the 
intended use by the methods described in Part II. 



Sec. 13.0; Identification of Family of PBDs, Related 
15 to PPBD. to Be Generated 



Sec. 13.lt Choosing residues on IPBD (or other PPBD) 
to vary: 



20 We choose residues in the IPBD to vary through 

consideration of several factors, including: a) the 3D 
structure of the IPBD, b) sequences homologous to IPBD, 
and c) modeling of the IPBD and mutants of the IPBD. 
Because the number of residues that could strongly 

25 influence binding is always greater than the number 
that can be varied simultaneously, the user must pick a 
subset of those residues to vary at one time. The user 
must also pick trial levels of variegation and 
calculate the abundances of various sequences. The 

30 list of varied residues and the level of variegation at 
each varied residue are adjusted until the composite 
variegation is commensurate with C sens j[ and M n tv 



A key concept is that only structured proteins 
35 exhibit specific binding, i»e. can bind to a particular 
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chemical entity to the exclusion of most others. Thus 
the residues to be varied are chosen with an eye to 
preserving the underlying IPBD structure. 
Substitutions that prevent the PBD from folding will 
5 cause GPs carrying those genes to bind indiscriminately 
so that they can easily be removed from the population. 

Burial of hydrophobic surfaces so that bulk water 
is excluded is one of the strongest forces driving the 

10 binding of proteins to other molecules. Bulk water can 
be excluded from the region between two molecules only 
if the surfaces are complementary. We must test as 
many surfaces as possible to find one that is 
complementary to the target. The select ion- through- 

15 binding isolates those proteins that are more nearly 
complementary to some surface on the target. The 
effective diversity of a variegated population is 
measured by the number .of different surfaces, rather 
than the number of protein sequences. Thus we should 

20 maximize the number of surfaces generated in our 
population, rather than the number of protein 
sequences . 

In hypothetical example 1, we consider a 
25 hypothetical PBD, shown in Figure 3 binding to a 
hypothetical target. Figure 3 is a 2D schematic of 3D 
objects; by hypothesis, residues 1, 2, 4, 6, 7/ 13, 14, 
15, 20, 21, 22, 27, 29, 31, 33, 34, 36, 37, 38, and 39 
of the IPBD are on the 3D surface of the IPBD, even 
30 though shown well inside the circle. Proteins do not 

have distinct, countable faces. Therefore we define an « 
"interaction set" to be a set of residues such that all 
members of the set can simultaneously touch one ? 
molecule of the target material without any atom of the 
35 target coming closer than van der Waals distance to any 
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main-chain atom of the IPBD. The concept of a residue 
"touching" a molecule of the target is discussed below. 
One hypothetical interaction set, Set A, in Figure 3 
comprises residues 6, 7, 20, 21, 22, 33, and 34, 
5 represented by squares. Another hypothetical 
interaction set, Set B, comprises residues 1, 2, 4, 6, 
31, 37, and 39, represented by circles. 

If we vary one residue, number 21 for example, 
10 through all twenty amino acids, we obtain 20 protein 
sequences and 20 different surfaces for interaction set 
A. Note that residue 6 is in two interaction sets and 
variation of residue 6 through all 20 amino acids 
yields 20 versions of interaction set A and 2 0 versions 
15 of interaction set B. 

Now consider varying two residues, each through 
all twenty amino .acids, generating 400 protein 
sequences. If the two residues varied were, for 

20 example, number 1 and number 21, then there would be 
only 40 different surfaces because interaction set A 
does not depend on residue 1 and interaction set B does 
not depend on residue 21. If the two residues varied, 
however, were number 7 and number 21, then 400 surfaces 

25 would be generated. 

If N spatially separated residues are varied at 
one time, 20 x N surfaces are generated. Variation of 
N residues in the same interaction set yields 20 N 

30 surfaces. For example, if N = 7, variation of 
separated residues yields 140 surfaces while variation 
of interacting residues yields 20 7 =1.28 x lo 9 
surfaces. Thus, to maximize the number of surfaces 
generated when N residues are varied, all residues 

35 should be in the same interaction set* 
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The amount of surface area buried in strong 
protein-protein interactions ranges from 1000 S 2 to 
2000 ft 2 (SCHU79, pl03ff ) . Individual amino acids have 
5 total surface areas that depend mostly on type of amino 
acid and weakly on conformation. These areas range 
from about 180 fi 2 for glycine to about 360 S 2 for 
tryptophan. From amino-acid solvent exposures of 
published protein structures, we calculate that lOOOS 2 

10 on a protein surface comprises between 4 and 30 amino- 
acid residues. Varied amino acid sequences, as found 
in actual proteins, involve between 10 and 25 residues 
in forming 1000 S 2 of protein surface. Schulz and 
Schirmer estimate that 100 fi 2 of protein surface can 

15 exhibit as many as 1000 different specific patterns 
(SCHU79, pl05) . The number of surface patterns rises 
exponentially with the area that can be varied 
independently. One of the BPTI structures recorded in 
the Brookhaven Protein Data Bank (6PTI) , for example, 

20 has a total exposed surface area of 3997 & 2 (using the 
method of Lee and Richards (LEEB71) and a solvent 
radius of 1.4 & and atomic radii as shown in Table 7) . 
If we could vary this surface freely and if 100 S 2 can 
produce 1000 patterns, we could construct io 120 

25 different patterns by varying the surface of BPTI! 
This calculation is intended only to suggest the huge 
number of possible surface patterns based on a common 
protein backbone. 



30 One protein framework cannot, however, display all 

possible patterns over any one particular 100 fi 2 of 
surface merely by replacement of the side groups of 
surface residues. The protein backbone holds the 
varied side groups in approximately constant locations 

35 so that the variations are not independent. We can, 
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nevertheless, generate a vast collection of different 
protein surfaces by varying those protein residues that 
face the outside of the protein. 

Examination of a model of BPTI in contact with 
myoglobin shows that residues 3, 7, 8, 10, 13, 39, 41, 
and 42 can all simultaneously contact a molecule the 
size and shape of myoglobin. Residue 49 cannot touch a 
single myoglobin molecule simultaneously with any of 
the first set even though all are on the surface of 
BPTI. It is not the intent of the present invention, 
however, to use models to determine which part of the 
target molecule will actually be the site of binding by 
a PBD. 

For cassette mutagenesis, the protein residues to 
be varied are, preferably, close enough in sequence 
that the variegated DNA (vgDNA) encoding all of them 
can be made in one piece. The present invention is not 
limited to a particular length of vgDNA that can be 
synthesized. With current technology, a stretch of 60 
amino acids (180 DNA bases) can be spanned. 

One can use other mutational means, such as 
single-stranded-oligonucleotide-directed mutagenesis 
(B0TS85) using two or more mutating primers to mutate 
widely separated residues. 

Alternatively, to vary residues separated by more 
than sixty residues, two cassettes may be mutated. A 
first cassette is mutagenized to produce a population 
having, for example, up to 30,000 members. Using 
variegated OCV, we mutagenize a second cassette to 
produce a second variegated population having the 
desired diversity. 
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The composite level of variation must not exceed 
the prevailing capabilities to a) produce very large 
numbers of independently transformed cells or b) detect 
5 small components in a highly varied population. The 
limits on the level of variegation are discussed in 
Sec. 13.2. 

We assemble the data about the IPBD and the target 
10 that are useful in deciding which residues to vary 1) 
3D structure, or at least a list of residues on the 
surface of the IPBD, 2) list of sequences homologous to 
IPBD, and 3) model of the target" molecule or a stand-in 
for the target, 

15 

These data and an understanding of the behavior of 
different amino acids in proteins will be used to 
answer two questions: 

20 1) which residues of the IPBD are on the outside 

and close enough together in space to touch the 
target simultaneously? 

2) which residues of the IPBD can be varied with 
25 high probability of retaining the underlying IPBD 

structure? 

Although an atomic model of the target material 
from X-ray crystallography, NMR, etc. is preferred in 

30 such examination, it is not necessary. For example, if 
the target were a protein of unknown 3D structure, it 
would be sufficient to know the molecular weight of the 
protein and whether it were a soluble globular protein, 
a fibrous protein, or a membrane protein. One can then 

35 choose a protein of known structure of the same class 
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and similar size and shape to use as a molecular stand- 
in and yardstick. At low resolution, all proteins of a 
given size and class look much the same. The specific 
volumes are the same, all are more or less spherical 
5 and therefore all proteins of the same size and class 
have about the same radius of curvature. The radii of 
curvature of the two molecules determine how much of 
the two molecules can come into contact. 

10 The most appropriate method of picking the 

residues of the protein chain at which the amino acids 
should be varied is by viewing, with interactive 
computer graphics, a model of the IPBD. A stick-figure 
representation of molecules is preferred. A suitable 

15 set of hardware is an Evans & Sutherland PS390 graphics 
terminal (Evans & Sutherland Corporation, Salt Lake 
City, UT) and a MicroVAX II supermicro computer 
(Digital Eguipment Corp., Maynard, MA). Suitable 
programs for viewing and manipulating protein models 

20 include: a) PS-FRODO, written by T. A. Jones (JONE85) 
and distributed by the Biochemistry Department of Rice 
University, Houston, TX; and b) PROTEUS, developed by 
Dayringer, Tramantano, and Fletterick (DAYR86) . 

25 Theoretical calculations, such as dynamic 

simulations of proteins, are used to estimate the 
effect of substitution at a particular residue of a 
particular amino-acid type on the 3D structure of the 
parent protein. Such calculations might also indicate 

30 whether a particular substitution will greatly affect 
the flexibility of the protein. 

Sec. 13.1.1; The principal set: 
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Using the knowledge of which residues are on the 
surface of the IPBD, we pick residues that are close 
enough together on the surface of the IPBD to touch a 
molecule of the target simultaneously without having 
5 any IPBD main-chain atom come closer than van der Waals 
distance ( vi2. 4.0 to 5.0 A) from any target atom. A 
residue of the IPBD "touches" the target if: a) a main- 
chain atom is within van der Waals distance , viz. 4.0 
to 5.0 S of any atom of the target molecule, or b) the 

10 C be ta is within D cutoff of any atom of the target 
molecule so that a side-group atom could make contact 
with that atom. Because side groups differ in size 
( cf . Table 35) , some judgment is required in picking 
D cutoff- In tlie preferred embodiment, we will use 

15 Dcutoff " 8 »° S, but other values in the range 6,0 S to 
10.0 fi could be used. If IPBD has G at a residue, we 
construct a pseudo Cj^^a with the correct bond distance 
and angles and judge the ability of the residue to 
touch the target from this pseudo C^eta* 

20 

Alternatively, we choose a set of residues on the 
surface of the IPBD such that the curvature of the 
surface defined by the residues in the set is not so 
great that it would prevent contact between all 
25 residues in the set and a molecule of the target. This 
method is appropriate if the target is a macromolecule, 
such as a protein, because the PBDs derived from the 
IPBD will contact only a part of the macromolecular 
surface. 

30 

We prefer that there be some indication that the 
underlying IPBD structure will tolerate substitutions 
at each residue in the principal set of residues. 
Indications could come from various sources, including: 
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a) homologous sequences, b) static computer modeling, 
or c) dynamic computer simulations. 

The residues in the principal set need not be 
5 contiguous in the protein sequence. We require only 
that the amino acids in the residues to be varied all 
be capable of touching a molecule of the target 
material simultaneously without having atoms overlap. 
If the target were, for example, horse heart myoglobin, 
10 and if the IPBD were BPTI, any set of residues in one 
interaction set of BPTI defined in Table 34 could be 
picked. 



Preferably, the principal set contains eight to 
15 sixteen residues. This number of residues allows 
sufficient variability that a surface that is 
complementary to the target can be found, but is small 
enough that a significant fraction of the surface can 
be varied at one time. 

20 

Sec. 13 . 1.2: The secondary set: 

The secondary set comprises residues that touch 
residues in the primary set, and are excluded from the 
25 primary set because the residue: a) is internal, b) is 
highly conserved, or c) is on the surface, but the 
curvature of the IPBD surface prevents the residue from 
being in contact with the target at the same time as 
one or more residues in the primary set. 

30 

Internal residues, although frequently conserved 
and may tolerate some conservative changes such as I to 
L or F to Y. These changes affect the detail placement 
and dynamics of adjacent protein residues and such 
35 variation may be useful once an SBD is found. 
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Surface residues in the secondary set are most 
often located on the periphery of the principal set, 
which do not make direct contact with the target 
5 simultaneously with all other residues of the principal 
set. The charge on the amino acid in one of these 
residues could, however, have a strong effect on 
binding. It is appropriate to vary the charge of some 
or all of these residues to improve an SBD. For 
10 example, the variegated codon containing equimolar A 
and G at base 1, equimolar C and A at base 2, and A at 
base 3 yields amino acids T, A, K, and E with equal 
probability. 

15 Sec. 13.1.3: Choice of residues to vary initially: 

The allowed level of variegation that assures 
progressively determines how many residues can be 
varied at once; geometry determines which ones. 

20 

The user picks residues to vary in many ways; the 
following is a preferred manner. Pairs of residues are 
picked that are diametrically opposed across the face 
of the principal set. Two such pairs are used to 

25 delimit the surface, up/down and right/left. 
Alternatively, three residues that form an inscribed 
triangle, having as large an area as possible, on the 
surface are picked. One to three other residues are 
picked in a checkerboard fashion across the interaction 

30 surface. Choice of widely spaced residues to vary 
creates the possibility for high specificity because 
all the intervening residues must have acceptable 
complementarity before favorable interactions can occur 
at widely-separated residues. 

35 
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The number of residues picked is coupled to the 
range through which each can be varied by the 
restrictions discussed in Sec. 13.2. In the first 
round, we do not assume any binding between IPBD and 
5 the target and so progressivity is not an issue. At 
the first round, the user may elect to produce a level 
of variegation such that each molecule of vgDNA is 
potentially different through, for example, unlimited 
variegation of 10 codons (20 10 approx. = 10 13 ) . One 

10 run of the DNA synthesizer produces approximately 10 13 
molecules of length 100 nts. Inefficiencies in 
ligation and transformation will reduce the number of 
proteins actually tested to between 10 7 and 5 x 10 8 . 
Multiple iterations of the process with such very high 

15 levels of variegation will not yield repeatable 
results; the user must decide whether this is 
important. 

Sec. 13.2: Range of variati on at Each Site — of 

20 Mutation; 

The total level of variegation is the product of 
the number of variants at each varied residue. Each 
varied residue can have a different scheme of 
25 variegation, producing 2 to 20 different possibilities. 
We require that the process be progressive, i.e. each 
variegation cycle produces a better starting point for 
the next variegation cycle than the previous cycle 
produced. 

30 

N. B. : Setting the level of variegation such 
that the ppbd and many sequences related to 
the ppbd sequence are present in detectable 
amounts insures that the process is 
35 progressive. If the level of variegation is 
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so high that the ppbd sequence is present at 
such low levels that there is an appreciable 
chance that no trans formant will display the 
PPBD, then the best SBD of the next round 
5 could be worse than the PPBD. At excessively 

high level of variegation, each round of 
mutagenesis is independent of previous rounds 
and there is no assurance of progress ivity. 
This approach can lead to valuable binding 
10 proteins, but repetition of experiments with 

this level of variegation will not yield 
progressive results. Excessive variation is 
not preferred. 



15 If the level of variegation is such that the 

parental sequence and each single amino-acid change is 
present for selection r then we know that a selected 
sequence is closer to optimal or the same as the 
parent. If, on the other hand, very high levels of 

20 variegation are used, a sequence may be selected, not 
because it is superior to the parental sequence, but 
because the parental and improved sequences are, by 
chance , absent . 



25 Progressivity is not an all-or-nothing property. 

So long as most of the information obtained from 
previous variegation cycles is retained and many 
different surfaces that are related to the PPBD surface 
are produced, the process is progressive. If the level 

30 of variegation is so high that the ppbd gene may not be 
detected, the assurance of progressivity diminishes. 
If the probability of recovering PPBD is negligible, 
then the probability of progressive behavior is also 
negligible. 

35 
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15 



20 



25 



30 



An opposing force in our design considerations is 
that PBDs are useful in the population only up to the 
amount that can be detected; any excess above the 
detectable amount is wasted. Thus we produce as many 
surfaces related to PPBD as possible within the 
constraint that the PPBD be detectable. 

We defer specification of exactly how much 
variegation is allowed until we have: a) specified real 
nt distributions for a variegated codon, and b) 
examined the effects of discrepancies between specified 
nt distributions and actual nt distributions. 

Sec. 13.3: Design of vaDNA Encoding PBD Family; 

We must now decide how to distribute the 
variegation within the codons for the residues to be 
varied. These decisions are influenced by the nature 
of the genetic code. When vgDNA is synthesized, 
variation at the first base of a codon creates a 
population containing amino acids from the same column 
of the genetic code table (as shown in the Table 3-6 on 
p87 of WATS87) ; variation at the second base of the 
codon creates a population containing amino acids from 
the same row of the genetic code table; variation at 
the third base of the codon creates a population 
containing amino acids from the same box. If two or 
three bases in the same codon are varied, the pattern 
is more complicated. Work with 3D protein structural 
models may suggest definite sets of amino acids to 
substitute at a given residue, but the method of 
variation may require either more or fewer kinds of 
amino acids be included. For example, examination of a 
model might suggest substitution of N or Q at a given 
residue. Combinatorial variation of codons requires 
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that mixing N and Q at one location also include K and 
H as possibilities at the same residue. One must 
choose to put: 1) N only, 2) Q only, or 3) a mixture of 
N, K, H, and Q. The present invention does not rely on 
5 accurate predictions of which amino acids should be 
placed at each residue , rather attention is focused on 
which residues should be varied. 

There are many ways to generate diversity in a 
10 protein. (See RICH86, CARU85, and OLIP86.) One extreme 
case is that one or a few residues of the protein are 
varied as much as possible ( inter alia see CARU85, 
CARU87, RICH86, and WHAR86) . We will call this limit 
"Focused Mutagenesis". Focused Mutagenesis is 
15 appropriate when the IPBD or other PPBD shows little or 
no binding to the target, as at the beginning of the 
search for a protein to bind to a new target material. 
When there is no binding between the PPBD and the 
target, we preferably pick a set of five to seven 
20 residues and vary each through all 20 possibilities. 

An alternative plan of mutagenesis ("Diffuse 
Mutagenesis") is to vary many more residues through a 
more limited set of choices (See Vershon et al. , Chl5 

25 of IN0U86 and PAKU86) . This can be accomplished by 
spiking each of the pure nts activated for DNA 
synthesis ( e.g. nt-phosphoramidites) with a small 
amount of one or more of the other activated nts. 
Contrary to general practice, the present invention 

30 sets the level of spiking so that only a small 
percentage ( 1% to .00001%, for example ) of the final 
product contains the initial DNA sequence. Many 
single, double, triple, and higher mutations occur, but 
recovery of the basic sequence is a possible outcome. 

35 Let Nfc be the number of bases to be varied, and let Q 
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be the fraction of all sequences that should have the 
parental sequence, then M, the fraction of the mixture 
that is the majority component, is 

5 M = exp{ log e (Q)/N b } = 10 d^lO (Q)/N b ) . 

If, for example, thirty base pairs on the DNA 
chain were to be varied and 1% of the product is to 
have the parental sequence, then each mixed nt 

10 substrate should contain 86% of the parental nt and 14% 
of other nts. Table 8 shows the fraction (fn) of DNA 
molecules having n non-parental bases when 30 bases are 
synthesized with reagents that contain fraction M of 
the majority component. When M=. 63096, f24 and higher 

15 are less than 10~ 8 . The entry "most" in Table 8 is the 
number of changes that has the highest probability. 
Note that substantial probability for multiple 
substitutions only occurs if the fraction of parental 
sequence (fO) is allowed to drop to around 10~ 6 . 

20 Mutagenesis of this sort can be applied to any part of 
the protein at any time, but is most appropriate when 
some binding to the target has been established. The 
N b base pairs of the DNA chain that are synthesized 
with mixed reagents need not be contiguous. They are 

25 picked so that between N b /3 and N b codons are affected 
to various degrees. The residues picked for mutation 
are picked with reference to the 3D structure of the 
IPBD, if known. For example, one might pick all or 
most of the residues in the principal and secondary 

30 set. We may impose restrictions on the extent of 
variation at each of these residues based on homologous 
sequences or other data. The mixture of non-parental 
nts need not be random, rather mixtures can be biased 
to give particular amino acid types specific 

35 probabilities of appearance at each codon. For 
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example, one residue may contain a hydrophobic amino 
acid in all known homologous sequences; in such a case, 
the first and third base of that codon would be varied, 
but the second would be set to T. This diffuse 
5 structure-directed mutagenesis will reveal the subtle 
changes possible in protein backbone associated with 
conservative interior changes, such as V to I, as well 
as some not so subtle changes that require concomitant 
changes at two or more residues of the protein, 

10 

For Focused Mutagenesis, we now consider the 
distribution of nts that will be inserted at each 
variegated codon. Each codon could be programmed 
differently. If we have no information indicating that 

15 a particular amino acid or class of amino acid is 
appropriate, we strive to substitute all amino acids 
with equal probability because representation of one 
pbd above the detectable level is wasteful. Equal 
amounts of all four nts at each position in a codon 

20 yields the amino acid distribution in which each amino 
acid is present in proportion to the number of codons 
that code for it. This distribution has the 
disadvantage of giving two basic residues for every 
acidic residue. In addition, six times as much R, S, 

25 and L as W or M occur. If five codons are synthesized 
with this distribution , sequences encoding five Rs are 
7776-times more abundant than sequences encoding five 
Ws. To have W-W-W-W-W present at detectable levels, we 
must have R-R-R-R-R present in 7776-fold excess. 

30 

Let Abun(x) be the abundance of DNA sequences 
coding for amino acid x, defined by the distribution of 
nts at each base of the codon. For any distribution, 
there will be a most-favored amino acid (mfaa) with 
35 abundance Abun(mfaa) and a least- favored amino acid 
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(lfaa) with abundance Abun(lfaa) . We seek the nt 
distribution that allows all twenty amino acids and 
that yields the largest' ratio Abun(lfaa)/Abun(mfaa) 
subject to two constraints: equal abundances of acidic 
5 and basic amino acids and the least possible number of 
stop codons. Thus only nt distributions that yield 
Abun(E)+Abun(D) = Abun(R)+Abun(K) are considered, and 
the function maximized is: 

10 { (l-Abun(stop) ) (Abun(lfaa)/Abun(mfaa) ) }. 

We have simplified the search for an optimal nt 
distribution by limiting the third base to T or G (C or 
G is equivalent) • All amino acids are possible and the 
15 number of accessible stop codons is reduced because TGA 
and TAA codons are eliminated. The amino acids F, Y, 
C, H, N, I, and D require T at the third base while W, 
M, Q, K, and E require G. Thus we use an equimolar 
mixture of T and G at the third base. 

20 

A computer program, written as part of the present 
invention and named "Find Optimum vgCodon" (See Table 
9) , varies the composition at bases 1 and 2, in steps 
of 0.05, and reports the composition that gives the 
25 largest value of the quantity { (Abun(lfaa)/Abun(mfaa) 
(l-Abun(stop) ) ) }. A vg codon is symbolically defined 
by the nt distribution at each base: 







T 


C 


A 


G 


30 base 


#1 = 


tl 


Cl 


al 


gi 


4 base 


#2 = 


t2 


C2 


a2 


g2 


base 


#3 = 


t3 


C3 


a3 


g3 



35 



tl + cl + al + gl = 
t2 + c2 + a2 + g2 = 



1.0 
1.0 
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t3 = g3 = 0.5, c3 = a3 = 0. 

The variation of the quantities tl, cl, al f gl, t2, c2, 
5 a2, and g2 is subject to the constraint that 
Abun (E) +Abun (D) equals Abun (K) +Abun (R) ; 

Abun(E)+Abun(D) = gl*a2 

Abun(K)+Abun(R) = al*a2/2 + cl*g2 + al*g2/2 

10 

gl*a2 = al*a2/2 + cl*g2 + al*g2/2 

Solving for g2, we obtain 

15 g2 = (gl*a2 - 0.5*al*a2)/(cl + 0.5*al) . 

In addition, 

tl = 1 - al - cl - gl 
20 t2 = 1 - a2 - c2 - g2 

We vary al, cl, gl, a2, and c2 and then calculate tl, 
g2, and t2. Initially, variation is in steps of 5%. 
Once an approximately optimum distribution of nts is 
25 determined, the region is further explored with steps 
of 1%. The logic of this program is shown in Table 9. 
The optimum distribution is: 

Optimum vqCodon 

30 

T C A G 

base #1 = 0.26 0.18 0.26 0.30 
base #2 = 0.22 0.16 0.40 0.22 
base #3 = 0.5 0.0 0.0 0-5 

35 




PCT/US89/03731 



WO 90/02809 



99 



» 
» 

10 



15 



20 



25 



and yields DNA molecules encoding each type amino acid 
with the abundances shown in Table 10. 

The computer that controls a DNA synthesizer, such 
as the Milligen 7500, can be programmed to synthesize 
any base of an oligo-nt with any distribution of nts by 
taking some nt substrates ( e>a. nt phosphoramidites) 
from each of two or more reservoirs. Alternatively, nt 
substrates can be mixed in any ratios and placed in one 
of the extra reservoir for so called "dirty bottle" 
synthesis. 

The actual nt distribution obtained will differ 
from the specified nt distribution due to several 
causes, including: a) differential inherent reactivity 
of nt substrates, and b) differential deterioration of 
reagents. It is possible to compensate partially for 
these effects, but some residual error will occur. We 
denote the average discrepancy between specified and 
observed nt fraction as S err , 

s err ~ s ^^ re root f average[ (f 0 bs f spec) / f spec ] ) 

were f 0 b S is the amount of one type of nt found at a 
base and f spe c is the amount of that type of nt that 
was specified at the same base. The average is over 
all specified types of nts and over a number ( e.g. 10 
or 20) different variegated bases. By hypothesis, the 
actual nt distribution at a variegated base will be 
within 5% of the specified distribution. Actual DNA 
synthesizers and DNA synthetic chemistry may have 
different error levels. It is the user's 
responsibility to determine S err for the DNA 
synthesizer and chemistry employed. 



35 
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To determine the possible effects of errors in nt 
composition on the amino-acid distribution, we modified 
the program "Find Optimum vgCodon" in four ways: 



5 1) the fraction of each nt in the first two bases 

is allowed to vary* from its optimum value times (1 
" S err ) to the optimum value times (1 + S err ) in 
seven- equal steps (S err is the hypothetical 
fractional error level entered by the user) ; the 
10 sum of nt fractions at one base always equals 1.0, 



2) g2 is varied in the same manner as a2, i.e. we 
dropped the restriction that Abun(D) +Abun(E) = 
Abun(K)+Abun(R) , 

15 

3) t3 and g3 are varied from 0.5 times (1 - S err ) 
to 0.5 times (1 + S err ) in three equal steps, 

4) the smallest ratio Abun(lfaa)/Abun(mfaa) is 
20 sought. 



In actual experiments, we will direct the synthesizer 
to produce the optimum DNA distribution "Optimum 
vgCodon" given above. Incomplete control over DNA 

25 chemistry may, however, cause us to actually obtain the 
following distribution that is the worst that can be 
obtained if all nt fractions are within 5% of the 
amounts specified in "Optimum vgCodon". A 
corresponding table can be calculated for any given 

30 S err using the program "Find worst vgCodon within Serr 
of given distribution." given in Table 11. 



Optimum vgCodon, worst 5% errors 



35 



T C A G 
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base #1 = 0.251 
base #2 = 0.209 
base #3 = 0.475 



0.189 



0.273 



0.287 



0.160 



0.400 



0.231 



0.0 



0.0 



0.525 
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This distribution yields DNA encoding different 
amino acids at the abundances shown in Table 12. 

If five codons are synthesized with reagents mixed 
so as to produce the nt-distribution "Optimum vgCodon", 
and if we actually obtained the nt-distribution 
"Optimum vgCodon, worst 5% errors", then DNA sequences 
encoding the mfaa at all of the five codons are about 
277 times as likely as DNA sequences encoding the Ifaa 
at all of the five codons; about 24% of the DNA 
sequences will have a stop codon in one or more of the 
five codons. 

When five codons are synthesized using equimolar 
mixtures at bases 1 and 2, (Abun(mfaa)/Abun(lfaa) ) 5 = 
7776- If we program the optimum nt distribution and 
come within 5%, then (Abun(mfaa)/Abun(lf aa) ) 5 = 277. 
The total number of different PBDs is unchanged, but 
the least-favored sequence is about 28 times more 
abundant. Detecting the least-favored amino-acid 
sequence when varying four residues with equimolar nts 
at each varied base requires as sensitive a separation 
system as does detecting the least-favored amino-acid 
sequence when varying five residues with the optimized 
nt distribution. 

By hypothesis, the distribution "Optimal vgCodon" 
is used in the second version of the second variegation 
of hypothetical example 2. The abundance of the DNA 
encoding each type of amino acid is, however, taken 
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from the Table 12. The abundance of DNA encoding the 
parental amino acid sequence is: 

Amount (parental seq.) 
5 F24 G30 D34 E42 T47 

= Abun(F) * Abun(G) * Abun(D) * Abun(E) * Abun(T) 
« .0249 x .0663 X .0545 X .0602 x .0437 
= 2.4 X KT 7 

10 Therefore, DNA encoding the PPBD sequence as well as 
very many related sequences will be present in 
sufficient quantity to be detected and we are assured 
that the process will be progressive. 

15 A level of variegation that allows recovery of the 

PPBD has two properties: 

1) we cannot regress because the PPBD is 
available, 

20 

2) an enormous number of multiple changes related 
to the PPBD are available for selection and we are 
able to detect and benefit from these changes. 

25 The user must adjust the list of residues to be 

varied and levels of variegation at each residue until 
the calculated variegation is within the bounds set by 

M ntv and c sensi- 

30 Preferably, we also consider the interactions 

between the sites of variegation and the surrounding 
DNA. If the method of mutagenesis to be used is 
replacement of a cassette, we consider whether the 
variegation will generate gratuitous restriction sites 

35 and whether they seriously interfere with the intended 
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introduction of diversity. We reduce or eliminate 
gratuitous restriction sites by appropriate choice of 
variegation pattern and silent alteration of codons 
neighboring the sites of variegation. See the Detailed 
Example. 

Sec. 14.1; Insertion of synthetic vaDNA i nto a 
Plasmids: 

For cassette mutagenesis, restriction sites were 
designed and synthesized, and are used to introduce the 
synthetic vgDNA into the OCV. Restriction digestions 
and ligations are performed by standard methods 
(AUSU87) . In the case of single-stranded- 

oligonucleo tide-directed mutagenesis , synthetic vgDNA 
is used to create diversity in the vector (BOTS85) . 

Sec. 14.2; Transformation of cells: 

The present invention is not limited to any one 
method of transforming cells with DNA. Standard 
methods, such as thos described in MANI82, may be 
optimized for the particular host cells and OCV. The 
goal is to produce a large number of independent 
trans formants, preferably 10 7 of more. It is not 
necesS ary to isolate transformed cells between 
transformation and affinity separation. We prefer to 
have transformed cells at high concentration so that 
they can be plated densely on relatively few plates. 

Sec. 14.3: Growth of the GPfvaPBD^ popul ation; 



35 



The transformed cells are grown first under non- 
selective conditions that allow expression of plasmid 
genes and then selected to kill untransformed cells. 



WO 90/02809 



PCT/US89/03731 



104 

Transformed cells are then induced to express the osp- 
pbd gene at the appropriate level of induction, as 
determined in Sec. 10.1. The GPs carrying the IPBD are 
harvested by a method appropriate to the package. 

5 

A high level of diversity can be generated by in 
vitro variegated synthesis of DNA and this diversity 
can be maintained passively through several generations 
in an organism without positive selective pressure. 

10 Loss or reduction in frequency of deleterious mutations 
is advantageous for the purposes of the present 
invention. It is preferable that the selection is must 
be performed before more than a few generations elapse. 
Moreover, subdividing the variegated population before 

15 amplification in an organism by removing a small sample 
(less than 10%) for further work would result in loss 
of diversity; therefore, one should use all or most of 
the synthetic DNA and most or all of the transformed 
cells . 

20 

Sec. 15>t Isolation of GPfPBDVs with binding-to- 
tarcret phenotypes : 

The harvested packages are enriched for the 
25 binding-to-target phenotype by use of affinity 
separation involving target material immobilized on a 
matrix. Packages that fail to bind to target material 
are washed away. If the packages are bacteriophage or 
endospores, it may be desirable to include a 
30 bacteriocidal agent, such as azide, in the buffer to 
prevent bacterial growth. 

Sec. 15.1: Attaching the target material to a column: 



105 

Affinity column chromatography is the preferred 
method of affinity separation, but other affinity 
separation methods may be used. A variety of 
commercially available support materials for affinity 
chromatography are used. These include derivatized 
beads to which the target 'material is covalently 
linked, or non-derivatized material to which the target 
material adheres irreversibly. 

Suppliers of support material for affinity 
chromatography include: Applied Protein Technologies 
Cambridge, MA; Bio-Rad Laboratories, Rockville Center, 
NY; Pierce Chemical Company, Rockford, IL. Target 
materials are attached to the matrix in accord with the 
directions of the manufacturer of each matrix 
preparation with consideration of good presentation of 
the target. 

Sec. 15.2! Reducing selection due to non-specific 
binding; 

We reduce non-specific binding of GP(PBD)s to the 
matrix that bears the target in two ways: 

1) we treat the column with blocking agents such 
as genetically defective GPs or a solution of 
protein before the population of GP(vgPBD)s is 
chromatographed , and 

2) we pass the population of GP(vgPBD)s over a 
matrix containing no target or a different target 
from the same class as the actual target prior to 
affinity chromatography. 
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Step (1) above saturates any non-specific binding that 
the affinity matrix might show toward wild-type GPs or 
proteins in general; step (2) removes components of our 
population that exhibit non-specific binding to the 
5 matrix or to molecules of the same class as the target. 
If the target were horse heart, myoglobin, for example, 
a column supporting bovine serum albumin could be used 
to trap GPs exhibiting PBDs with strong non-specific 
binding to proteins. If cholesterol were the target, 

10 then a hydrophobic compound, such as p- 
tertiarybutylbenzyl alcohol, could be used to remove 
GPs displaying PBDs having strong non-specific binding 
to hydrophobic compounds. It is anticipated that PBDs 
that fail to fold or that are prematurely terminated 

15 will be non-specifically sticky. The capacity of the 
initial column that removes indiscriminately adhesive 
PBDs should be greater ( e.g. 5 fold greater) than the 
column that supports the target molecule. 

20 Variation in the support material (polystyrene, 

glass, agarose, etc. ) in analysis of clones carrying 
SBDs is used to eliminate enrichment for packages that 
bind to the support material rather than the target. 

25 Sec. 15.3: Elutina the column: 

The population of GPs is applied to an affinity 
matrix under conditions compatible with the intended 
use of the binding protein and the population is 
30 fractionated by passage of a gradient of some solute 
over the column. The process enriches for PBDs having 
affinity for the target and for which the affinity for 
the target is least affected by the eluants used. 
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Ions or cofactors needed for stability of PBDs 
(derived from IPBD) or target must be included in 
buffers at appropriate levels. We first remove 
GP(PBD)s that do not bind the target by washing the 
5 matrix with the volume of the initial buffer required 
to bring the optical density (at 260 nm or 280 nm) back 
to base line plus one to five void volumes (V v ) m The 
column is then eluted with a gradient of increasing: a) 
salt, b) [H+] (decreasing pH) , c) neutral solutes, d) 

10 temperature (increasing or decreasing) , or e) some 
combination of these factors. Salt is the most 
preferred solute for gradient formation. Other solutes 
that generally weaken non-covalent interaction may also 
be used. "Salt" includes solutions containing any of 

15 the following ionic- species: 



25 



Na+ 


K+ 


Ca++ 


Mg++ 


NH 4 + 


Li+ 


Sr++ 


Ba++ 


Rb+ 


Cs+ 


CI- 


Br- 


S0 4 — 


HS0 4 - 


PO4— 


HP0 4 ~ 


H 2 P0 4 - 


C0 3 — 


non- 


Acetate 


Citrate 


Standard 1- 
Amino Acids 


standard 
nucleotides 


Guanidinium 
CI 



30 

Other ionic or neutral solutes may be used. All 
solutes are subject to the necessity that they not kill 
the genetic packages. Neutral solutes, such as 

35 ethanol, acetone, ether, or urea, are frequently used 
in protein purification, however, many of these are 
very harmful to bacteria and bacteriophage above low 
concentrations. Bacterial spores, on the other hand, 
are impervious to most neutral solutes. Several passes 

40 may be made through the steps in Sec. 15. Different 
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solutes may be used in different analyses, salt in one, 
pH in the next, etc . 

Sec. 15.4; Recovery of packages: 

5 

Recovery of packages that display binding to an 
affinity column may be achieved in several ways, 
including from: 

10 1) fractions eluted with a gradient as described 

above; 

2) fractions eluted with soluble target material, 

3) cells grown in situ on the matrix, 

4) cells incubated with parts of the matrix, 
15 5) fractions eluted after chemically or 

enzymatically degrading the linkage holding the 
target to the matrix, and 
6) regeneration of GPs after degrading the 
packages and recovering OCV DNA. 

20 

It is possible to utilize combinations of these 
methods. It should be remembered that what we want to 
recover from the affinity matrix is not the GPs per se , 
but the information in them. Recovery of viable GPs is 
25 very strongly preferred, but recovery of genetic 
material is essential. 



Inadvertent inactivation of the GPs is very 
deleterious. It is preferred that maximum limits for 

30 solutes that do not inactivate the GPs or denature the 
target or the column are determined. One may use 
conditions that denature the column to elute GPs; 
before the target is denatured, a portion of the 
affinity matrix should be removed for possible use as 

35 an inoculum. As the GPs are held together by protein- 
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protein interactions and other non-covalent molecular 
interactions, there will be cases in which the 
molecular package will bind so tightly to the target 
molecules on the affinity matrix that the GPs can not 
5 be washed off in viable form. This will only occur 
when very tight binding has been obtained. In these 
cases, methods (3) through (5) above can be used to 
obtain the bound packages or the genetic messages from 
the affinity matrix. 

10 

It is possible, by manipulation of the elution 
conditions, to isolate SBDs that bind to the target at 
one pH (pH b ) but not at another pH (pH Q ) . The 
population is applied at pH b and the column is washed 

15 thoroughly at pH^. The column is then eluted with 
buffer at pH Q and GPs that come off at the new pH are 
collected and cultured. Similar procedures may be used 
. for other solution parameters, such as temperature. 
For example, GP(vgPBD)s could be applied to a column 

20 supporting insulin. After eluting with salt to remove 
GPs with little or no binding to insulin, we elute with 
salt and glucose to liberate GPs that display PBDs that 
bind insulin or glucose in a competitive manner. 

25 Sec, 15.5: .Amplifying the Enriched Packages 

Viable GPs having the selected binding trait are 
amplified by culture in a suitable medium, or, in the 
case of phage, infection into a host so cultivated. If 
30 the GPs have been inactivated by the chromatography, 
the OCV carrying the osp - pbd gene must be recovered 
from the GP, and introduced into a new, viable host. 



35 



Sec. 15.6: Determining whether further enrichment is 
needed: 
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The probability of isolating a GP with improved 
binding increases by C e ff with each separation cycle. 
Let N be the number of distinct amino-acid sequences 
5 produced by the variegation. We want to perform K 
separation cycles before attempting to isolate an SBD, 
where K is such that the probability of isolating a 
single SBD is 0.10 or higher. 

10 K = the smallest integer>= log 10 (0.10 N)/log 10 (C^f f ) 

For example, if N were 1.0 x 10 7 and C e ff = 6.31 x 10 2 
then log 10 (1.0 x 10 6 )/log 10 (6.31 x 10 2 ) = 6.0000/2.8000 
= 2.14. Therefore we would attempt to isolate SBDs 
15 after the third separation cycle. After only two 
separation cycles , the probability of finding an SBD is 
(6.31 x 10 2 ) 2 /(1.0 x 10 7 ) = .04 and attempting to 
isolate SBDs might be profitable. 

20 Clonal isolates from the last fraction eluted in 

Sec. 15.3 containing any viable GPs, as well as clonal 
isolates obtained by culturing an inoculum taken from 
the affinity matrix, are cultured. If K separation 
cycles have been completed, samples from a number, e.g. 

25 32, of these clonal isolates are tested for elution 
properties on the {target} column. If none of the 
isolated, genetically pure GPs show improved binding to 
target, or if K cycles have not yet been completed, 
then we pool and culture, in a manner similar to the 

30 manner set forth in Sec. 14.3, the GPs from the last 
few fractions eluted (see Sec. 15.4) that contained 
viable GPs and from the GPs obtained by culturing an 
inoculum taken from the column matrix. We then repeat 
the enrichment procedure described in Sec. 15. This 
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cyclic enrichment may continue Nc^om passes or until 
an SBD is isolated. 

If one or more of the isolated GPs has improved 
retention on the {target} column, we determine whether 
the retention of the candidate SBDs is due to affinity 
for the target material. Target material is attached 
to a different support matrix at optimal density and 
the elution volumes of candidate GP(SBD)s are measured. 
We pick the candidate that either has the highest 
elution volume or that is retained on the column after 
elution. If none of the candidate GP(SBD)s has higher 
elution volume than GP(PPBD of this round) , then we 
pool and culture the GPs from the last few fractions 
that contained viable GPs and the GPs obtained by 
culturing an inoculum taken from the column matrix. We 
then repeat the enrichment procedure of Sec. 15. 

If all of the SBDs show binding that is superior 
to PPBD of this round, we pool and' culture the GPs from 
the last fraction that contains viable GPs and from the 
inoculum taken from the column. This population is re- 
chromatographed at least one pass to fractionate 
further the GPs based on K<j. 

If an RNA phage were used as GP, the RNA would 
either be cultured with the assistance of a helper 
phage or be reverse transcribed and the DNA amplified. 
The amplified DNA could then be sequenced or subcloned 
into suitable plasmids. 

Sec. 15.7: Characterizing the Population; 



We characterize members of the population showing 
35 desired binding properties by genetic and biochemical 
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methods. We obtain clonal isolates and test these 
strains by genetic and affinity methods to determine 
genotype and phenotype with respect to binding to 
target. For several genetically pure isolates that 
5 show binding, we demonstrate that the binding is caused 
by the artificial chimeric gene by excising the osp-sbd 
gene and crossing it into the parental GP. We also 
ligate the deleted backbone of each GP from which the 
oso-sbd is removed and demonstrate that each backbone 
10 alone cannot confer binding to the target on the GP. 
We sequence the osp-sbd gene from several clonal 
isolates . 

Sec. 15.8: Testing of binding affinity: 

15 

For one or more clonal isolates, we subclone the 
sbd gene fragment, without the osp fragment, into an 
expression vector such that each SBD can be produced as 
a free protein. Each SBD protein is purified by normal 

20 means, including affinity chromatography. Physical 
measurements of the strength of binding are then made 
on each free SBD protein by one of the following 
methods: 1) alteration of the Stokes radius as a 
function of binding of the target material, measured by 

25 characteristics of elution from a molecular sizing 
column such as agarose, 2) retention of radiolabeled 
SBD on a spun affinity column to which has been affixed 
the target material, or 3) retention of radiolabeled 
target material on a spun affinity column to which has 

30 been affixed the SBD. The measurements of binding for 
each free SBD are compared to the corresponding 
measurements of binding for the PPBD. 



In each assay, we measure the extent of binding as 
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a function of concentration of each protein, and other 
relevant physical and chemical parameters. 



In addition, the SBD with highest affinity for the 
5 target from each round is compared to the best SBD of 
the previous round (IPBD for the first round) and to 
the IPBD with respect to affinity for the target 
material. Successive rounds of mutagenesis and 
selection-through-binding yield increasing affinity 
10 until desired levels are achieved. 

If binding is not yet sufficient, we must decide 
which residues to vary next (see Sec. 16.0). 

15 Sec. 15,9: Other Affinity Separation Means; 

FACs may be used to separate GPs that bind 
fluorescent labeled target with the optimized 
parameters determined in Part II. We discriminate 
20 against art if actual binding to the fluorescent lable by 
using two or more different dyes, chosen to be 
structurally different. 

Electrophoretic affinity separation uses unaltered 
25 target so that only other ions in the buffer can give 
rise to artif actual binding. Artif actual binding to 
the gel material gives rise to retardation independent 
of field direction and so is easily eliminated. A 
variegated population of GPs will have a variety of 
3 0 charges . 

First the variegated population of GPs is 
electrophoresed in a gel that contains no target 
material. The electrophoresis continues until the GPs 
35 are distributed along the length of the lane. The 
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target-free lane in which the initial electrophoresis 
is conducted is separated by a removable baffle from a 
square of gel that contains target material . The 
baffle is removed and a second electrophoresis is 
conducted at right angles to the first. GPs that do 
not bind target migrate with unaltered mobility while 
GPs that do bind target will separate from the majority 
that do not bind target. A diagonal line of non- 
binding GPs will form. This line is excised and 
discarded. Other parts of the gel are dissolved and 
the GPs cultured. 

Sec. 16.0: The Next Variegation Cycle: 

Which residues of the PBD should be varied in the 
next variegation cycle? The general rule is to 
preserve as much accumulated information as possible. 
The amino acids just varied are the ones best 
determined. The- environment of other residues has 
changed, so that it is appropriate to vary them again. 
Because there are always more residues in the principal 
and secondary sets than can be varied simultaneously, 
we start by picking residues that either have never 
been varied (highest priority) or that have not been 
varied for one or more cycles. If we find that varying 
all the residues except those varied in the previous 
cycle does not allow a high enough level of diversity, 
then residues varied in the previous cycle might be 
varied again. For example, if ^-hB number of 
independent trans formants that can be produced a nd ^he 
sensitivity of the affinity separation were such that 
seven residues could be varied, and if the principal 
and secondary sets contained 13 residues, we would 
always vary seven residues, even though that implies 
varying some residue twice in a row. In such cases, we 



PCT/US89/03731 



WO 90/02809 



115 



* 

10 



15 



20 



25 



30 

Jr 

35 



would pick the residues just varied that contain the 
amino acids of highest abundance in the variegated 
codons used. 

It is the accumulation of information that allows 
the process to select those protein sequences that 
produce binding between the SBD and the target- Some 
interfaces between proteins and other molecules involve 
twenty or more residues* Complete variation of twenty 
residues would generate 10 26 different proteins. By 
dividing the residues that lie close together in space 
into overlapping groups of five to seven residues, we 
can vary a large surface but never need to test more 
than 10 7 to 10 9 candidates at once, a savings of 10 19 
to 10 17 fold. 

Having picked the residues to vary, we again set 
the range of variegation for each residue according to 
the principles set forth in 13.2, design the vgDNA 
encoding the desired mutants (Sec. 13.3), clone the 
vgDNA into GPs (Sec. 14), and select-by-binding-to- 
target those GPs bearing SBDs (Sec. 15). 

Sec. 17.0; OTHER CONSIDERATIONS: 

sec. 17.1 : Joint selections: 

One may modify the affinity separation of the 
method described to select a molecule that binds to 
material A but not to material B. One needs to prepare 
two selection columns, one with material A and the 
other with material B. The population of genetic 
packages is prepared in the manner described, but 
before applying the population to A, one passes the 
population over the B column so as to remove those 
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members of the population that have high affinity for 
B. It may be necessary to amplify the population that 
does not bind to B before passing it over A. 
Amplification would most likely be needed if A and B 
5 were in some ways similar and the PPBD has been 
selected for having affinity for A. 

For example, to obtain an SBD that binds A but not 
B, three columns could be connected in series: a) a 

10 column supporting some compound, neither A nor B, or 
only the matrix material, b) a column supporting B, and 
c) a column supporting A. A population of GP(vgPBD)s 
is applied to the series of columns and the columns are 
washed with the buffer of constant ionic strength that 

15 is used in the application. The columns are uncoupled, 
and the third column is eluted with a gradient to 
isolate GP(PBD)s that bind A but not B. 



One can also generate molecules that bind to both 
20 A and B. In this case we use a 3D model and mutate one 
face of the molecule in question to get binding to A. 
We then mutate a different face to produce binding to 
B. 

25 The materials A and B could be proteins that 

differ at only one or a few residues. For example, A 
could be a natural protein for which the gene has been 
cloned and B could be a mutant of A that retains the 
overall 3D structure of A. SBDs selected to bind A but 

30 not B must bind to A near the residues that are mutated 
in B. If the mutations were picked to be in the active 
site of A (assuming A has an active site) , then an SBD 
that binds A but not B will bind to the active site of 
A and is likely to be an inhibitor of A. 
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To obtain a protein that will bind to both A and 
B, we can, alternatively, first obtain an SBD that 
binds A and a different SBD that binds B. We can then 
combine the genes encoding these domains so that a two- 
domain single-polypeptide protein is produced. The 
fusion protein will have affinity for both A and B. 

One can also generate binding proteins with 
affinity for both A and B, such that these materials 
compete for the same site on the binding protein. We 
guarantee competition by overlapping the sites for A 
and B. We first create a molecule that binds to target 
material A. We then vary a set of residues defined as: 
a) those residues that were varied to obtain binding to 
A, plus b) those residues close in 3D space to the 
residues of set (a) but that are internal and so are 
unlikely to bind directly to either A or B. Residues 
in set (b) are likely to make small changes in the 
positioning of the residues in set (a) such that the 
affinities for A and B will be changed by small 
amounts. Members of these populations are selected for 
affinity to both A and B. 

Sec. 17.2; Selection for non-binding: 

The method of the present invention can be used to 
select proteins that do not bind to selected targets. 
Consider a protein of pharmacological- importance, such 
as streptokinase, that is antigenic to an undesirable 
extent. We can take the pharmacologically important 
protein as IPBD and antibodies against it as target. 
Residues on the surface of the pharmacologically 
important protein would be variegated and GP(PBD)s that 
do not bind to an antibody column would be collected 
and cultured. Surface residues may be identified in 
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several ways, including: a) from a 3D structure, b) 
from hydrophobicity considerations , or c) chemical 
labeling. The 3D structure of the pharmacologically 
important protein remains the preferred guide to 
picking residues to vary, except now we pick residues 
that are widely spaced so that we leave as little as 
possible of the original surface unaltered. 

Destroying binding frequently requires only that a 
single amino acid in the binding interface be changed. 
If polyclonal antibodies are used, we face the problem 
that all or most of the strong epitopes must be altered 
in a single molecule. Preferably, one would have a set 
of monoclonal antibodies, or a narrow range of antibody 
species. If we had a series of monoclonal antibody 
columns, we could obtain one or more mutations that 
abolish binding to each monoclonal antibody. We could 
then combine some or all of these mutations in one 
molecule to produce a pharmacologically important 
protein recognized by none of the monoclonal 
antibodies. Such mutants must be tested to verify that 
the pharmacologically interesting properties have not 
be altered to an unacceptable degree by the mutations. 

Typically, polyclonal antibodies display a range 
of binding constants for antigen. Even if we have only 
polyclonal antibodies that bind to the 
pharmacologically important protein, we may proceed as 
follows. We engineer the pharmacologically important 
protein to appear on the surface of a replicable GP. 
We introduce mutations into residues that are on the 
surface of the pharmacologically important protein or 
into residues thought to be on the surface of the 
pharmacologically important protein so that a 
population of GPs is obtained. Polyclonal antibodies 
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are attached to a column and the population of GPs is 
applied to the column at low salt. The column is 
eluted with a salt gradient. The GPs that elute at the 
lowest concentration of salt are those which bear 
5 pharmacologically important proteins that have been 
mutated in a way that eliminates binding to the 
antibodies having maximum affinity for the 
pharmacologically important protein. The GPs eluting 
at the lowest salt are isolated and cultured. The 
10 isolated SBD becomes the PPBD to further rounds of 
variegation so that the antigenic determinants are 
successively eliminated. 

Sec. 17.3: Selection of PBDs for retention — of 
15 structure ; 

We can select for insertions or deletions that 
preserve the 3D structure of known binding proteins. 
Consider on GP that express BPTI on its surface. In 

20 the bnti-osp gene, we can replace the codons for K26 
and A27 with five variegated codons (3.2 x 10 6 
sequences) . K26 and A27 are in a turn and are far from 
the trypsin binding surface. We use selection-through- 
binding to isolate GPs expressing mutants of BPTI that 

25 retain high, specific affinity for trypsin. 

Sec. 17.4: Created binding p roteins not unique; 



For each target, there are a large number of SBDs 
30 that may be found by the method of the present 
invention. To increase the probability that some PBD 
in the population will bind to the target, we generate 
as large a population as we can conveniently subject to 
selection-through-binding. Key questions in management 
35 of the method are "How many trans formants can we 
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produce? 11 , and "How small a component can we find 
through selection-through-binding? 11 . Geneticists 
routinely find mutations with frequencies of one in 
10 10 using simple r powerful selections. The optimum 
5 level of variegation is determined by the maximum 
number of transformants and the selection sensitivity, 
so that for any reasonable sensitivity we may use a * 
progressive process to obtain a series of proteins with 
higher and higher affinity for the chosen target 
10 material. Enrichments of 1000-fold by a single pass of 
elution from an affinity plate have been demonstrated 
(SMIT85) . 

Use of different variation schemes can yield 
15 different binding proteins. For any given target , a 
large plurality of proteins will bind to it. Thus, if 
one binding protein turns out to be unsuitable for some 
reason ( e.g. too antigenic) , the procedure can be 
repeated with different variation parameters . For 
20 example, one might choose different residues to vary or 
pick a different nt distribution at variegated codons 
so that a new distribution of amino acids is tested at 
the same residues. Even if the same principal set of 
residues is used, one might obtain a different SBD if 
25 the order in which one picks subsets to be varied is 
altered. 

Sec. 17.5: Other modes of mutagenesis possible: 

30 The modes of creating diversity in the population 

of GPs discussed herein are not the only modes * 
possible. Any method of mutagenesis that preserves at 
least a large fraction of the information obtained from * 
one selection and then introduces other mutations in 

35 the same domain will work. The limiting factors are 
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the number of independent transf ormants that can be 
produced and the amount of enrichment one can achieve 
through affinity separation. Therefore the preferred 
embodiment uses a method of mutagenesis that focuses 
5 mutations into those residues that are most likely to 
affect the binding properties of the PBD and are least 
likely to destroy the underlying structure of the IPBD. 



Other modes of mutagenesis might allow other GPs 
10 to be considered. For example, the bacteriophage 
lambda is not a useful cloning vehicle for cassette 
mutagenesis because of the plethora of restriction 
sites. One can, however, use single-stranded-oligo- 
nt-directed mutagenesis on lambda without the need for 
15 unique restriction sites. No one has used single- 
stranded-oligo-nt-directed mutagenesis to introduce the 
high level of diversity called for in the present 
invention, but if it is possible, such a method would 
allow use of phage with large genomes. 



20 
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Example 1 



BPTI-Derived Binding Protein for HHMb; Displayed bv M13 
5 Phage 

Presented below is a hypothetical example of a 
protocol for developing a new binding molecule derived 
from BPTI with affinity for horse heart myoglobin 

10 (HHMb) using the common coli bacteriophage M13 as 
genetic package. It will be understood that some 
further optimization , in accordance with the teachings 
herein, may be necessary to obtain the desired results. 
Possible modifications in the preferred method are 

15 discussed immediately following various steps of the 
hypothetical example. 



20 



25 



By hypothesis, we set the following technical 
capabilities: 

Y D q 500 ng/synthesis of ssDNA 100 bases 

long, 

10 ug/ synthesis of ssDNA 60 bases long, 
1 mg/ synthesis of ssDNA 20 bases long. 

M DNA 100 bases 



Y p i 1 mg/1 

30 L ef 0.1 % for blunt-blunt, 

4 % for sticky-blunt, 
11 % for sticky-sticky. 

M ntv 5 x 108 



35 
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c eff 900-fold enrichment 

c sensi 1 in 4 x 10 8 



N chrom 10 passes 



s err 0*05 



10 Example 1, Part I 



In this example, we will use M13 as a replicable 
GP and BPTI as IPBD. In Part I, we are concerned only 
with getting BPTI displayed on the outer surface of an 

15 M13 derivative. - Variable DNA may be introduced in the 
osp-iobd gene, but not within the region that codes for 
the trypsin-binding region of BPTI. Once BPTI is 
displayed on the M13 outer surface of an M13 
derivative, we proceed to Part II to optimize the 

20 affinity separation procedures. 

For this example, we choose a filamentous 
bacteriophage of coli, M13. We prefer phage over 
vegetative bacterial cells because phage are much less 
25 metabolically active. We prefer phage over spores 
because the molecular mechanisms of the virion 
formation and 3D structure of the virion are much 
better understood than are the corresponding processes 
of spore formation and structures of spores. 

30 

M13 is a very well studied bacteriophage, widely 
used for DNA sequencing and as a genetic vector; it is 
a typical member of the class of filamentous phages. 
The relevant facts about M13 and other phages that will 
35 allow us to choose among phages are cited in Sec. 
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1.3.1. 



Compared to other bacteriophage , filamentous phage 
in general are attractive and M13 in particular is 
5 especially attractive because: 

1) the 3D structure of the virion is known, 

2) the processing of the coat protein is well 
10 understood, 

3) the genome is expandable, 

4) the genome is small, 

15 

5) the sequence of the genome is known, 

6) the virion is physically resistant to shear, 
heat, cold, guanidinium CI, low pH, and high salt, 

20 

7) the phage is a sequencing vector so that 
sequencing is especially easy, and 

8) antibiotic-resistance genes have been cloned 
25 into the genome with predictable results (HINE80) . 

Other criteria listed in Sec. 1.0 and 1.3 of the are 
also satisfied: M13 is easily cultured and stored 
(FRIT85) , each infected cell yielding 100 to 1000 M13 

30 progeny after infection. M13 has no unusual or 
expensive media requirements and is easily harvested 
and concentrated (SALI64, YAMA70, FRIT85) . M13 is 
stable toward physical agents: temperature (10% of 
phage survive 30 minutes at 85°C) , shear (Waring 

35 blender does not kill) , desiccation (not applicable) , 
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radiation (not applicable), age (stable for years). 



M13 is stable toward chemicals: pH (< 2.2 
(SMIT85)), surface active agents: not applicable, 
5 chaotropes (guanidinium HCl = 6.0 M) , ions (no specific 
sensitivities) , organic solvents (ether and other 
organic solvents are lethal (MARV78)), proteases (not 
applicable, HHMb not a protease) . M13 is not known to 
be sensitive to other enzymes. 

10 

Ml 3 genome is 6423 b.p. and the sequence is known 
(SCHA78). Because the genome is small, cassette 
mutagenesis is practical on RF M13 (AUSU87) , as is 
single- stranded oligo-nt directed mutagenesis (FRIT85) . 

15 M13 is a plasmid and transformation system in itself, 
and an ideal sequencing vector. M13 can be grown on 
Rec~ strains of Ej. coli . The M13 genome is expandable 
(MESS78, FRIT85) . M13 confers no advantage, but 
doesn f t lyse cells. The sequence of gene VIII is 

20 known, and the amino acid sequence can be encoded on a 
synthetic gene, using lacUVS promoter and used in 
conjunction with the LacI^ repressor. The lacUV5 
promoter is induced by IPTG. Gene VTII protein is 
secreted by a well studied process and is cleaved 

25 between A23 and A24. Residues 18, 21, 22, and 23 of 
gene VIII protein control cleavage. Mature gene VIII 
protein makes up the sheath around the circular ssDNA. 
The 3D structure of fl virion is known at medium 
resolution; the amino terminus of gene VIII protein is 

30 on surface of the virion. No fusions to M13 gene VIII 
protein have been reported. The 2D structure of M13 
coat protein is implicit in the 3D structure. Mature 
M13 gene VIII protein has only one domain. There are 
four minor proteins: gene III, VI, VII, and IX. Each 

35 of these minor proteins is present in about 5 copies 
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per virion and is related to morphogenesis or 
infection. The major coat protein is present in more 
than 2500 copies per virion. 

5 Although no fusions of M13 gene VIII to other 

genes have been reported, knowledge of the virion 3D 
structure (BANN810) makes attachment of IPBD to the 
amino terminus of mature M13 coat protein (M13 CP) 
quite attractive. Should direct fusion of BPTI to M13 
10 CP fail to cause BPTI to be displayed on the surface of 
M13, we will vary part of the BPTI sequence and/ or 
insert short random DNA sequences between BPTI and M13 
CP. 

15 Smith (SMIT85) and de la Cruz et al^ (CRUZ88) have 

shown that insertions into gene III cause novel protein 
domains to appear on the virion outer surface. If BPTI 
can not be made to appear on the virion outer surface 
by fusing the bpti gene to the m!3cp gene, we will fuse 

20 bpti to gene III either at the site used by Smith and 
by de la Cruz et al. or to one of the termini. We will 
use a second, synthetic copy of gene III so that some 
unaltered gene III protein will be present. 

25 The gene VIII protein is chosen as OSP because it 

is present in many copies and because its location and 
orientation in the virion are known. Note that any 
uncertainty about the azimuth of the coat protein about 
its own alpha helical axis is unimportant. 

30 

The 3D model of fl indicates strongly that fusing 
BPTI to the amino terminus of M13 CP is more likely to 
yield a functional protein than any other fusion site. 
(See Sec. 1.3.3) . 

35 
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The amino-acid sequence of M13 pre-coat (SCHA78) , 
called AA_seql, is 

AA_seql 

5 

1 1 2||2 3 3 4 4 5 
5050 V5 05050 
MKKSLVLKASVAVATLVPMLSFAAEGDDPAKAAFNSLQASATEYIGYAWA 

10 5 6 6 7 7 

5 0 5 0 3 
MVWTVGATIGIKLFKKFTSKAS 



15 The single-letter codes for amino acids and the codes 
for ambiguous DNA are internationally recognized 
(GEOR87) . The best site for inserting a novel protein 
domain into Ml 3 CP is after A23 because SP-I cleaves 
the precoat protein after A23, as indicated by the 

20 arrow. Proteins that can be secreted will appear 
connected to mature Ml 3 CP at its amino terminus. 
Because the amino terminus of mature M13 CP is located 
on the outer surface of the virion, the introduced 
domain will be displayed on the outside of the virion. 

25 

BPTI is chosen as IPBD of this example (See Sec. 
2.1) because it meets or exceeds all the criteria: it 
is a small, very stable protein with a well known 3D 
structure. Marks et al. (MARK86) have shown that a 
30 fusion of the phoA signal peptide gene fragment and DNA 
coding for the mature form of BPTI caused native BPTI 
to appear in the periplasm of L. coli , demonstrating 
that there is nothing in the structure of BPTI to 
prevent its being secreted. 

35 

Marks et al. (MARK87) also showed that the 
structure of BPTI is stable even to the removal of one 
of the cystine bridges. They did this by replacing 
both C14 and C38 with either two alanines or two 
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threonines. The C14/C38 cystine bridge that Marks et 
al. removed is the one very close to the scissile bond 
in BPTI; surprisingly, both mutant molecules 
functioned as trypsin inhibitors. This indicates that 
5 BPTI is redundantly stable and so is likely to fold 
into approximately the same structure despite numerous 
surface mutations. Using the knowledge of homologues, 
vide infra , we can infer which residues must not be 
varied if the basic BPTI structure is to be maintained. 

10 

The 3D structure of BPTI has been determined at 
high resolution by X-ray diffraction (HUBE77, MARQ83, 
WLOD84, WLOD87a, WLOD87b) , neutron diffraction 
(WLOD84) , and by NMR (WAGN87) . In one of the X-ray 

15 structures deposited in the Brookhaven Protein Data 
Bank, "6PTI", there was no electron density for A58, 
indicating that A58 has no uniquely defined 
conformation. Thus we know that the carboxy group does 
not make any essential interaction in the folded 

20* structure. The amino terminus of BPTI is very near to 
the carboxy terminus. Goldenberg and Creighton 
reported on circularized BPTI and circularly permuted 
BPTI (GOLD83). Some proteins homologous to BPTI have 
more or fewer residues at either terminus. 

25 

BPTI has been called "the hydrogen atom of protein 
folding" and has been the subject of numerous 
experimental and theoretical studies (STAT87, SCHW87, 
G0LD83, CHAZ83) . 

30 

BPTI has the added advantage that at least 32 
homologous proteins are known, as shown in Table 13. A 
tally of ionizable groups is shown in Table 14 and the 
composite of amino acid types occurring at each residue 
35 is shown in Table 15. 
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BPTI is freely soluble and is not known to bind 
metal ions. BPTI has no known enzymatic activity. 
BPTI binds to trypsin, K d 6.0 x 10~ 14 M (TSCH87). 
BPTI is not toxic. If K15 of BPTI is changed to L, 
there is no measurable binding between the mutant BPTI 
and trypsin (TSCH87) . 

All of the conserved residues are buried; of the 
seven fully conserved residues only G37 has noticeable 
exposure. The solvent accessibility of each residue in 
BPTI is given in Table 16 which was calculated from the 
entry "6PTI" in the Brookhaven Protein Data Bank with a 
solvent radius of 1.4 A, the atomic radii given in 
Table 7, and the method of Lee and Richards (LEEB71) . 
Each of the 51 non-conserved residues can accommodate 
two or more kinds of amino acids. By independently 
substituting at each residue only those amino acids 
already observed at that residue, we could obtain 
approximately 7 x 10 42 different amino acid sequences, 
most of which will fold into structures very similar to 
BPTI. 

BPTI will be useful as a IPBD for macromolecules. 
(See Sec. 2.1.1)»BPTI and BPTI homologues bind tightly 
and with high specificity to a number of enzymes. 

BPTI is strongly positively charged except at very 
high pH, thus BPTI is useful as IPBD for targets that 
are not also strongly positive under the conditions of 
intended use (see Sec. 2.1.2). There exist homologues 
of BPTI, however, having quite different charges ( viz. 
SCI-III from Bombvx mori at -7 and the trypsin 
inhibitor from bovine colostrum at -1) . Once a 
derivative of M13 is found that displays BPTI on its 
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surface, the sequence of the BPTI domain can be 
replaced by one of the homologous sequences to produce 
acidic or neutral IPBDs. 

5 BPTI is not an enzyme (See Sec. 2.1.3). BPTI is 

quite small; if this should cause a pharmacological 
problem, two or more BPTI-derived domains may be joined 
as in the human BPTI homologue that has two domains. 

10 A derivative of M13 is the preferred OCV. (See 

Sec. 3) . A "phagemid" is a hybrid between a phage and 
a plasmid, and is used in this invention. Double- 
stranded plasmid DNA isolated from phagemid-bearing 
cells is denoted by the standard convention, e.g. 

15 pXY24. Phage prepared from these cells would be 
designated XY24. Phagemids such as Bluescript K/S 
(sold by Stratagene) are not suitable for our purposes 
because Bluescript does not contain the full genome of 
M13 and must be rescued by coinfection with helper 

20 phage. Such coinfections could lead to genetic 
recombination yielding heterogeneous phage unsuitable 
for the purposes of the present invention. 

The bacteriophage M13 bla 61 (ATCC 37039) is 
25 derived from wild-type M13 through the insertion of the 
beta lactamase gene (HINE80) . This phage contains 8.13 
kb of DNA. M13 bla cat 1 (ATCC 37040) is derived from 
M13 bla 61 through the additional insertion of the 
chloramphenicol resistance gene (HINE80) ; M13 bla cat l 
30 contains 9.88 kb of DNA. Although neither of these 
variants of M13 contains the ColEl origin of 
replication, either could be used as a starting point 
to construct a usable cloning vector for the present 
example. 
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The OCV for the current example Is constructed by 
a process illustrated in Figure 4. A brief description 
of all the plasiaids and phagemids constructed for this 
Example is found in Table 17. 

For ss oligo-nt site-directed mutagenesis, 
multiple primers lead to higher efficiency. Three non- 
mutagenic primers are used: bases 2326-2352 of wt M13, 
bases 4854-4875 of wt M13, and the complement of bases 
3431-3451 of pBR322. Note that pLG2 and its 
derivatives carry the anti-sense strand of the amp R 
gene in the + DNA strand. The segments are picked to 
be high in GC content and to divide the pLG7 genome 
into several segments of approximately equal length. 

The genetic engineering procedures needed to 
construct the OCV are standard, using commercially 
available restriction enzymes under recommended 
conditions. All restriction fragments of DNA are 

purified by electrophoresis or HPLC. ' M13 and its 
engineered derivatives are infected into E^ coli strain 
PE384 (F + ,Rec~, Sup + ,Amp s ) . Plasmid DNA of Ml 3 
derivatives is transformed into coli strain PE383(F~ 
,Rec~,Sup + ,Amp s ) so that we avoid multiple rounds of 
infection in the culture. Isolation of M13 phage is by 
the procedure of Salivar et al^ (SALI64) ; isolation of 
replicative form (RF) M13 is by the procedure of 
Jazwinski et al. (JAZW73a and JAZW73b) . Isolation of 
plasmids containing the ColEl origin of replication is 
by the method of Maniatis (MANI82) . 

We pick the am]D R gene from pBR322 as a convenient 
antibiotic resistance gene. Another resistance gene, 
such as kanamycin, could be used. The Acc I-to- Aat II 
fragment of pBR322 is a conveniently obtained source of 
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any R and the Col El origin. 

M13iapl8 (New England BioLabs) contains neither Aat 
II nor Acc I sites. Therefore we insert an adaptor 
5- that allows us to insert the Aat II-to-Acc I fragment 
of pBR322 that carries the amp R gene and the ColEl 
origin of replication into a desirable place in 
M13mpl8. M13mpl8 contains a lacUV5 promoter and a lacZ 
gene that are not useful to the purposes of the present 
10 invention. By cutting M13mpl8 with Ava il and Bsu36I 
and discarding the approximately 600 intervening base 
pairs, we eliminate all recognition sites of several 
enzymes useful for engineering the bpti-gene VIII gene. 

15 The following adaptor is synthesized, 

5 1 GACCGACGTCtgcctcGTATACCGGACCGcatagctCC 3 1 olig#l 
3» GCTGCAGacggagCATATGGCCTGGCgtatcgaGGACT 5 1 olig#2 
Avail | Aatll| |Accl[RsrII | | Bsu3 61 

20 

The annealed adaptor is ligated with RF M13mpl8 
that has been cut with both Ava il and Bsu36I and 
purified by PAGE or HPLC. Transformed cells are 
25 selected for plasmid uptake with ampicillin. The 
resulting construct is called pLGl. 

DNA from pLGl is cut with both Aat II and Acc I. 
Aatll-to- Acc I fragment of pBR322 is ligated to the 
30 backbone of LG1. The correct construct is named pLG2. 

The Acc I restriction site is no longer needed for 
vector construction. To eliminate this site, RF pLG2 
dsDNA is cut with Acc I, treated with Klenow fragment 
35 and dATP and dTTP to make it blunt and then religated. 
The cloning vector, named pLG3, is now ready for 
stepwise insertion of the osp-ipbd gene. 
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We are now ready to design a gene (See Sec. 4) 
that will cause BPTI-domains to appear on the outer 
surface of an M13 derivative: LG7 • 

5 

To obtain a novel protein domain attached to the 
♦ outside of M13, we insert DNA that codes for mature 

BPTI after A23 of the precoat protein of M13. Mature 
BPTI begins with an arginine residue, which is charged; 
10 cleavage by signal peptidase I is normal in such cases. 
Signal peptidase I (SP-I) cuts a chimera of M13 coat 
protein and BPTI after A23 leaving mature BPTI attached 
at its carboxy end to the amino terminus of M13 CP. 



15 The following amino-acid sequence, called AA_seq2, 

is constructed, by inserting the sequence for mature 
BPTI (shown underscored) immediately after the signal 
sequence of M13 precoat protein (indicated by the 
arrow) and before the sequence for the M13 CP. 

20 

AA_seq2 

1 1 2||2 3 3 4 4 5 
5050 \/5 0 5050 
2 5 MTnc^T.VT.TCASVAVATLVPMLSF ARPDFCLEPPYTGPCKARIIRYFYNAKA 



566778899 10 
50505 0 5050 
3 0 GLCOTFVYGGCRAKRNNFKSAEDCMRTCGGAA EGDDPAKAAFNSLQASAT 



10 11 11 12 12 13 
5 0 5 0 5 0 
35 EYIGYAWAMWVIVGATIGIKLFKKFTSKAS 



40 



Sequence numbers of fusion proteins refer to the 
fusion, as coded, unless otherwise noted. Thus the 
alanine that begins M13 CP is referred to as "number 



WO 90/02809 



PCT/US89/03731 



134 

82", "number 1 of M13 CP", or "number 59 of the mature 
BPTI-M13 CP fusion". 



The osp-iobd gene is regulated by the lacUVS 
5 promoter and terminated by the trpA transcription 
terminator. The host strain of L coli harbors the 
lacl 3 gene. The osp-ipbd gene is expressed and 
processed in parallel with the wild- type gene VIII. 
The novel protein, that consists of BPTI tethered to a 

10 M13 CP domain, constitutes only a fraction of the coat. 
Affinity separation is able to separate phage carrying 
only five or six copies ' of a molecule that has high 
affinity for an affinity matrix (SMIT85) ; 1% 
incorporation of the chimeric protein results in about 

15 30 copies of the protein exposed on the surface. If 
this is insufficient, additional copies may be provided 
by, for example, increasing IPTG. 

A model comprising M13 coat, after the model for 
20 fl of Marvin and colleagues (BANN81) , and a BPTI 
domain, taken from the Brookhaven Protein Data Bank 
entry "6PTI", was constructed by standard model 
building methods that insure that covalent bond lengths 
and angles are close to acceptable values. The model 
25 shows that the fusion protein could fit into the 
supramolecular structure in a stereochemical^ 
acceptable fashion without disturbing the internal 
structure of either the M13 CP or BPTI domain. 

30 The ambiguous DNA sequence coding for AA_seq2, is 

examined by a computer program for places where 
recognition sites for restriction enzymes could be 
created without altering the amino-acid sequence. (See 
Sec. 4.3). A master table of enzymes is compiled from 

35 the catalogues of enzyme suppliers. The enzymes that 
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do not cut the OCV. 



(Preferably constructed as 



» 

10 



15 



20 



25 



30 



described above) . 

Using the procedure given in Sec. 4.3, we design a 
ipbd gene, such as that shown in Table 25. Some 
restriction enzymes ( e.g. Ban I or Hph I) cut the OCV 
too often to be of value. 

The entire DNA sequence of the ml3cp-bt)ti fusion 
with annotation appears in Table 25 showing the useful 
restriction sites and biologically important features, 
viz. the lacUVS promoter, the lacO operator, the Shine- 
Dalgarno sequence, the amino acid sequence, the stop 
codons, and the transcriptional terminator. 

The ipbd gene is synthesized in several steps 
using the method described in Sec. 5.1, generating 
dsDNA fragments of 150 to 190 base pairs. 

The four steps (See Sec. 6.1) by which we clone 
synthetic fragments of the ml3cp-bpti gene (the osp- 
ipbd gene of the present example) into pLG3 and its 
derivatives are illustrated in Figure 5. 

The sequence to be introduced into pLG3 comprises 
a) the segment from RsrII to Avrll (Table 25), b) a 
spacer sequence (gccgctcc) , and c) the segment from 
Asu II to Sau l. The segment is 158 bases long and is 
synthesized from two shorter synthetic oligo-nts as 
described in Sec. 5.1 of the generic specification. 

Table 27 shows the antisense strand of the 
sequence to be inserted. The 99 base fragment shown in 
upper case letters and underscored (5 1 - 
CCGTCC. . . .CCTTCG-3 1 = olig#3) is synthesized in the 
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standard manner. Similarly, the 100 base long fragment 
of the sense strand shown in lower case (5 1 - 
cgctca. • . . aattg-3 1 = olig#4) is synthesized. After 
annealing, the double-stranded region is extended with 
5 Klenow fragment by the procedure given above to make 
the entire 176 bases double stranded. The overlap 
region is 23 base pairs long and contains 14 CG pairs 
and 9 AT pairs. The DNA between Avrl l and AsuII does 
not code for anything in the final pbd gene; it is 

10 there so that the DNA can be cut by both Avr ll and 
Asu II at the same time in the next step. Eight bases 
have been added to the left of Rsr II and nine bases 
have been added to the left of Sau l (same specificity 
and cutting pattern as Bsu36I) . These bases at the 

15 ends are not part of the final product; they must be 
present so that the restriction enzymes can bind and 
cut the synthetic DNA to produce specific sticky ends. 

The synthetic DNA is cut with both Sau l and Rsr II 
20 and is ligated to similarly cut dsDNA of pLG3. The 
construct with the correct insert is called pLG4. 

The second step of the construction of the OCV is 
illustrated in Table 28. As in the construction of 

25 pLG4 , two pieces of single-stranded DNA are 
synthesized: a 99 base long fragment of the anti-sense 
strand ending with p25 and a 99 base long fragment 
(starting with pl8) . Both the synthetic dsDNA and dsRF 
pLG4 DNA are cut with both Avr ll and Asu II and are 

30 ligated and used to transform E^. coli. The construct 
carrying this second insert is called pLG5. 

Construction of pLG6 proceeds similarly to the 
construction of pLG5. The sequence is shown in Table 
35 30. The two single stranded segments (one from the 
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anti-sense strand ending with N66 and the other from 
the sense strand starting with the third base of the 
codon for Y58) are synthesized, annealed, and extended 
with Klenow fragment. Both the synthetic DNA and RF 
5 pLG5 are cut with both BssH I and AsuII, purified, and 
the appropriate pieces are ligated and used to 
transform IL. coli . 



The construction of pLG7 is illustrated in Table 
10 32 and proceeds similarly to the constructions of pLG4, 
pLG5, and pLG6. The two single stranded segments (one 
from the anti-sense strand ending with the first base 
of the codon for V110 and the other beginning with 
E101) are synthesized, annealed, and extended with 
15 Klenow fragment. Both the synthetic DNA and RF pLG6 
are cut with both Bbel and AsuII , purified, and the 
appropriate pieces are ligated and used to transform EL. 
coli. The construct with the correct fourth insert is 
called pLG7; the display of BPTI on the outer surface 
20 of LG7 is verified by the methods of Sec. 8. 

M13am429 is an amber mutation of M13 used to 
reduce non-specific binding by the affinity matrix for 
phages derived from M13. Ml3am429 ± s derived by 
25 standard genetic methods (MILL72) from wtM13. 

Phage LG7 is grown on coli strain PE384 in LB 
broth with various concentrations of IPTG added to the 
medium to induce the osp-iobd gene. Phage LG7 is 
30 obtained from cells grown with 0.0, 0.1, 1.0, 10.0 or 
100.0 uM, or 1.0 mM IPTG, harvested (See Sec. 7) by the 
method of Salivar (SALI64) , and concentrated to obtain 
a titre of 10 12 pfu/ml by the method of Messing 
(MESS83) . 



35 
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The preferred method of determining whether LG7 
displays BPTI on its surface (See Sec. 8) is to 
determine whether these phage can retain a labeled 
derivative of trypsin (trp) or anhydro trypsin (AHTrp) 
5 on a filter that allows passage of unbound trp or 
AHTrp. Trypsin contains 10 tyrosine residues and can 
be iodinated with 125j by standard methods; we denote 
the labeled trypsin as "trp*". Labeled anhydrotrypsin 
is denoted as "AHTrp*". Other types of labels can be 

10 used on trp or AHTrp, e.g. biotin or a fluorescent 
label. AHTrp* or trp* is labeled to an activity of 0.3 
uCi/ug. A sample of 10 12 LG7(10 mM IPTG) is mixed with 
1.0 ug of trp* or AHTrp* in 1.0 ml of a buffer of 10 mM 
KC1, adjusted to pH 8.0 with 1 mM K 2 HP0 4 / KH 2 P0 4 . The 

15 mixture is passed through an Amicon MSP1 system fitted 
with a membrane filter that allows passage of proteins 
smaller that M r = 300,000. Filters are soaked in 
buffer containing trp or AHTrp prior to the analysis. 
The filter is washed twice with 0.5 ml of buffer 

20 containing trp or AHTrp. The radioactivity retained on 
the filter is quantitated with a scintillation counter 
or other suitable device. If each virion displays one 
copy of BPTI, then .05 ug of protein can be bound that 
would give rise to 3 x 10 4 disintegrations / minute on 

25 the filter. 

An alternative way to quantitate display of BPTI 
on the surface of LG7 is to use the stoichiometric 
binding between trypsin and BPTI to titrate the BPTI. 

30 A solution that titers 10 12 pfu/ml of a phage is 
approximately 1.6 x 10" 9 M in phage if each virion is 
infective. The ratio of pfu to total phage can be 
determined spectrophotometrically using the molar 
extinction coefficients at 260 nm and 280 nm corrected 

35 for the increased length of LG7 as compared to wtM13. 
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For example, if a 1.0 ml solution that contains 10 12 
pfu of LG7 phage grown with 1.0 mM IPTG inhibits 
trypsin solutions up to 4.8 x 10" 7 M, we calculate that 
there are approximately 300 BPTIs/GP ( i.e. (4.8 x 10" 7 
5 molecules of BPTI/1)/(1.6 x 10" 9 phage/1)). Inhibition 
of a specified concentration of trypsin is most easily 
. measured spectrophotometrically using a peptide-linked 

dye, such as N alpha -benzoyl-Arg-Nan (TSCH87) . 

10 Alternatively, binding to an affinity column may 

be used to demonstrate the presence of BPTI on the 
surface of phage LG7 . An affinity column of 2.0 ml 
total volume having BioRad Affi-Gel 10 (™) matrix and 
30 mg of AHTrp as affinity material is prepared by the 

15 method of BioRad. The void volume (V v ) of this column 
is, by hypothesis, 1.0 ml. This affinity column is 
denoted {AHTrp}. 



A sample of 10 12 M13am429 is applied to {AHTrp} in 

20 1.0 ml of 10 mM KC1 buffered to pH 8.0 with KH 2 P0 4 / 
K 2 HP0 4 . The column is then washed with the same buffer 
until the optical density at 280 nm of the effluent 
returns to base line or 4 x V v have been passed through 
the column, whichever comes first. Samples of LG7 or 

25 LG10 are then applied to the blocked {AHTrp} column at 
10 12 pfu/ml in 1.0 ml of the same buffer. The column 
is then washed again with the same buffer until the 
optical density at 280 nm of the effluent returns to 
base line or 4 x V v have been passed through, whichever 

30 comes first. Following this wash, a gradient of KC1 
from 10 mM to 2 M in 3 x V v , buffered to pH 8.0 with 
phosphate is passed over the column. The first KCl 
gradient is followed by a KCl gradient running from 2 M 
to 5 M in 3 x V v . The second KCl gradient is followed 

35 by a gradient of guanidinium CI from 0.0 M to 2.0 M in 
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2 x Vy in 5 M KC1 and buffered to pH 8.0 with 
phosphate. Fractions of 50 ul are collected and 
assayed for phage by plating 4 ul of each fraction at 
suitable dilutions on sensitive cells. Retention of 
5 phage on the column is indicated by appearance of LG7 
phage in fractions that elute significantly later from 
the column than control phage LG10 or wtM13. A 
successful isolate of LG7 that displays BPTI is 
identified , the bpti insert and junctions are 
10 sequenced, and this isolate is used for further work 
described below. 



If vgDNA is used to obtain a functional fusion 
between a BPTI mutant and M13 CP f vide infra ) , then DNA 

15 from a clonal isolate is sequenced in the regions that 
were variegated. Then gratuitous restriction sites for 
useful restriction enzymes are removed if possible by 
silent codon changes. The sequence numbers of residues 
in OSP-IPBD will be changed by any insertions; 

20 hereinafter , we will, however, denote residues inserted 
after residue 23 as 23a, 23b, etc. Insertions after 
residue 81 will be denoted as 81a, 81b, etc. This 
preserves the numbering of residues between C5 and C55 
of BPTI. Residue C5 of BPTI is always denoted as 28 in 

25 the fusion; residue C55 of BPTI is always denoted as 78 
in the fusion, and the intervening residues have 
constant numbers. 

Should LG7 phage from cells grown with 10 mM IPTG 
30 fail to display BPTI on its surface, we have several 
options. We might try to determine why the 
construction failed to work as expected. There are 
various possible modes of failure, including : a) BPTI 
is not cleaved from the M13 signal sequence, b) BPTI is 
35 cleaved from the M13 CP, and c) the chimeric protein is 
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made and cleaved after the signal sequence, but the 
processed protein is not incorporated into the M13 
coat. BPTI has been secreted from E*. coli (MARK86) ; 
however the M13 coat-protein signal sequence was not 
5 used. Therefore problems stemming from the signal 
sequence are unlikely, but possible. We could 
• determine whether BPTI was present in the periplasm or 

bound to the inner membrane of LG7-infected cells by 
assays using try* or Antry*. 

10 

Proteins in the periplasm can be freed through 
spheroplast formation using lysozyme and EDTA in a 
concentrated sucrose solution (BIRD67, MALA64) . If 
BPTI were free in the periplasm, it would be found in 

15 the supernatant. Try* would be mixed with supernatant 
and passed over a non-denaturing molecular sizing 
column and the radioactive fractions collected. The 
radioactive fractions would then be analyzed by SDS- 
PAGE ' and examined for BPTI -sized bands by silver 

20 staining. 



Spheroplast formation exposes proteins anchored in 
the inner membrane. Spheroplasts are mixed with AHTrp* 
and then either filtered or centrifuged to separate 
25 them from unbound AHTrp*. After washing with 
hypertonic buffer, the spheroplasts are analyzed for 
extent of AHTrp* binding alternatively, membrane 
proteins are analyzed by western blot analysis. 

30 If BPTI is found free in the periplasm, then we 

would expect that the chimeric protein was being 
cleaved both between BPTI and the Ml 3 mature coat 
sequence and between BPTI and the signal sequence. In 
that case, we should alter the BPTI/M13 CP junction by 

35 inserting vgDNA at codons for residues 78-82 of 
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AA_seq2 . 

If BPTI is found attached to the inner membrane, 
then there are two likely explanations. The first is 
5 that the chimeric protein is being cut after the signal 
sequence, but is not being incorporated into LG7 
virion; the treatment would also be to insert vgDNA 
between residues 78 and 82 of AA_seq2. The alternative 
hypothesis is that BPTI could fold and react with 

10 trypsin even if signal sequence is not cleaved. N- 
terminal amino acid sequencing of trypsin-binding 
material isolated from cell homogenate determines what 
processing is occurring. If signal sequence were being 
cleaved, we would use the procedure above to vary 

15 residues between C78 and A82; subsequent passes would 
add residues after residue 81. If signal sequence were 
not being cleaved, we would vary residues between 23 
and 27 of AA_seq2. Subsequent passes through that 
process would add residues after 23. 

20 

If BPTI were found neither in the periplasm nor on 
the inner membrane, then we would expect that the fault 
was in the signal sequence or the signal-sequence-to- 
BPTI junction. The treatment in this case would be to 
25 vary residues between 23 and 27. 

Several experiments that introduce variegation 
into the bpti-aene VIII fusion are possible, including: 

30 1) 3 variegated codons between residues 78 and 82 

using olig#12 and olig#13, 

2) 3 variegated codons between residues 23 and 27 
using olig#14 and olig#15, 



35 
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3) 5 variegated codons between residues 78 and 82 
using olig#13 and olig#12a, 

4) 5 variegated codons between residues 23 and 27 
using olig#15 and olig#14a, 

5) 7 variegated codons between residues 78 and 82 
using olig#13 and olig#12b, and 

6) 7 variegated codons between residues 23 and 27 
using olig#15 and olig#14b. 

To alter the BPTI-M13 CP junction, we introduce 
DNA variegated at codons for residues between 78 and 82 
into the Se& I and Sf_i I sites of pLG7. The residues 
after the last cysteine are highly variable in amino 
acid seguences homologous to BPTI, both in composition 
and length; in Table 25 these residues are denoted as 
G79, G80, and A81. The first part of the M13 CP is 
denoted as A82, E83, and G84. One of the oligo-nts 
olig#12, olig#12a, or olig#12b and the primer olig#13 
are synthesized by standard methods. The oligo-nts 
are: 

residue 75 76 77 78 79 80 81 82 83 
5 ' gc | gag | cGC | ATG | CGT | ACC | TGC| qf k| qfk| qfk | GCT | GAA | - 

84 85 86 87 88 89 90 91 
GGT|GAT|GAT|CCG|GCC|AAA|GCG|GCC|gcg|cc 3' olig#12 

residue 75 76 77 78 79 80 81 81a 81b 
5 1 gc | gag | cGC | ATG | CGT | ACC I TGC I qf k | qfk | qf k | gf k | qf k | - 



82 83 84 85 86 87 
GCT | GAA | GGT | GAT | GAT | CCG | - 



40 



88 89 90 91 
GCC|AAA|GCG|GCC|gcg|cc 3' olig#12a 
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residue 75 76 77 78 79 80 81 81a 81b 
5 ' gc | gag | CGC | ATG | CGT | ACC | TGC | qfk | qfk | qf]c | qf k | qfk | - 

81c 81d 82 83 84 85 86 87 
qfk | qfk | GCT | GAA | GGT | GAT | GAT | CCG [ - 

88 89 90 91 
GCC|AAA|GCG|GCC|gcg|cc 3' olig#12b 



residue 91 90 89 88 87 86 
5 1 gg|cgc|GGC|CGC|TTT|GGC|CGG|ATC 3 1 olig#13 

15 

where q is a mixture of (0.26 T, 0.18C, 0.26 A, and 
0.30 G) , f is a mixture of (0.22 T, 0.16 C, 0.40 A, and 
0.22 G) , and k is a mixture of equal parts of T and G. 

20 The bases shown in lower case at either end are spacers 
and are not incorporated into the cloned gene. The 
primer is complementary to the 3 1 end of each of the 
longer oligo-nts. One of the variegated oligo-nts and 
the primer olig#13 are combined in equimolar amounts 

25 and annealed. The dsDNA is completed with all four 
(nt)TPs and Klenow fragment. The resulting dsDNA and 
RF pLG7 are cut with both Sfi I and Sph I, purified, 
mixed, and ligated. This ligation mixture goes through 
the process described in Sec. 15 in which we select a 

30 transformed clone that, when induced with IPTG, binds 
AHTrp. 



To vary the junction between M13 signal sequence 
and BPTI, we introduce DNA variegated at codons for 

35 residues between 23 and 27 into the Kon I and Xho I 
sites of pLG7. The first three residues are highly 
variable in amino acid sequences homologous to BPTI. 
Homologous- sequences also vary in length at the amino 
terminus. One of the oligo-nts olig#14, olig#14a, or 

40 olig#14b and the primer olig#15 are synthesized by 
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standard methods. The oligo-nts are: 

residue : 17 18 19 20 21 22 23 24 25 

5 5' g | gcc | gcG | GTA | CCG | ATG |; CTG | TCT | TTT | GCT | qfk | qfk | - 

26 27 28 29 30 
|qfk|TTC|TGT|CTC|GAG|cgc|ccg|cga| 3' olig#14 

10 

residue 17 18 19 20 21 22 23 24 25 26 
5 ' g | gcc | gcG | GTA | CCG | ATG | CTG | TCT | TTT | GCT | qf k | qf k | qf k | - 

15 26a 26b 27 28 29 30 

|qfk|qfk|TTC|TGT|CTC|GAG|cgc|ccg|cga| 3' olig#14a, 

20 residue 17 18 19 20 21 22 23 24 25 26 

5 » g | gcc | gcG | GTA | CCG | ATG | CTG | TCT | TTT | GCT | qf k | qf k | qf k | - 

26a 26b 26c 26d 27 28 29 30 

| qf k | qf k | qf k | qf k | TTC | TGT | CTC | GAG | cgc | ccg | cga | 3 • olig# 14b 

25 

5» | teg I egg | gcg | CTC | GAG | ACA | GAA | 3' olig#15 

30 where q is a mixture of (0.26 T, 0.18 C, 0.26 A, and 
0.30 G) , f is a mixture of (0.22 T, 0.16 C, 0.40 A, and 
0.22 G) , and k is a mixture of equal parts of T and G. 
The bases shown in lower case at either end are 
spacers. One of the variegated oligo-nts and the 

35 primer are combined in equimolar amounts and annealed. 
The ds DNA is completed with all four (nt)TPs and 
Klenow fragment. The resulting dsDNA and RF pLG7 are 
cut with both Kpn I and Xho I, purified, mixed, and 
ligated. This ligation mixture goes through the 

40 process described in Sec. 15 in which we select a 
transformed clone that, when induced with IPTG, binds 
AHTrp or trp. 
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If none of these approaches produces a working 
chimeric protein, we may try a different signal 
sequence , or a different OSP in M13 (e.g., the gene III 
protein for which there is fusion data (SMIT85, 
5 CRUZ88)), or another genetic package. 

Example 1, Part II 



BPTI binds very tightly to trypsin 
10 (Ka = 6.0 x 10" 14 M) and to anhydro trypsin, so that 
these molecules are not preferred for optimizing the 
amount of BPTI to display on LG7 or the amount of 
affinity molecule to attach to the column. Tschesche 
et aL=_ reported on the binding of several BPTI 
15 derivatives to various proteases: 

Dissociation constants for BPTI derivatives, Molar. 

Residue Trypsin Chymotrypsin Elastase Elastase 
#15 (bovine (bovine (porcine (human 

20 pancreas) pancreas) pancreas) leukocytes) 

lysine 6.0 x 10" 14 9.0 x 10~ 9 - 3.5 x 10~ 6 

glycine - - + 7.0 x 10~ 9 

alanine + - 2.8 x 10~ 8 2.5 x 10" 9 

valine - - 5.7 x 10~ 8 1.1 xl0~ 10 

25 leucine - - 1.9 x 10~ 8 2.9 x 10~ 9 



From the report of Tschesche et al. we infer that 
molecular pairs marked have K^s greater than 

3.5 x 10" 6 M and that molecular pairs marked have 
30 K^s much greater than 3.5 x 10~ 6 M. Because of the 
wealth of data about the binding of BPTI and various 
mutants to trypsin and other proteases (TSCH87) , we can 
proceed in various ways. (For other PBDs we can obtain 
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15 



20 



25 



30 



two different monoclonal antibodies, one with a high 
affinity having of order 10"* 11 M, and one with a 
moderate affinity having on the order of 10~ 6 M.) 
In this example, we may use: a) the moderate binding 
between BPTI and human leukocyte elastase (HuLEl) , b) 
the moderately strong binding of porcine elastase to 
BPTI(V15), or c) the binding of BPTI(A15) (residue 38 
in the pbd gene) for trypsin (weak but detectable) or 
for porcine pancreatic elastase. 

We compare the retention of LG7 virions to the 
retention of wild-type M13 on {AHTrp} . M13 derivatives 
having more DNA than wild-type Ml 3 have corresponding 
longer virions. Thus we will create pLG8 that differs 
from pLG7 only in having stop codons at codons 2 and 
3, and an altered L codon at codon 7 of the osp-iobd 
gene. Phage LG8 will have exactly as much DNA as LG7; 
therefore the LG8 virion is exactly as long as the LG7 
virion. LG8 can not, however, display BPTI on its 
surface. 

To expedite identification of different M13- 
derived phage, we replace the amp R gene of LG8 with the 
tet R gene from pBR322 by standard methods. The BSMI- 
to-Aatll tet R bearing fragment of pBR322 is ligated 
into DNA from pLG8 cut with Xbal and Aatll. The 
correct construction, having 9.2 kb, is easily 
distinguished from pBR322 and is called LG10. 

The phage LG7 is grown at various levels of IPTG 
in the medium and harvested in the way previously 
described. An affinity column having bed volume of 2.0 
ml and supporting an amount of HuLEl picked from the 
range 0.1 mg to 30.0 mg on 1 ml of BioRad Affi- 
Gel 10 (™) or Affi-Gel 15 (™) is designated {HuLEl} . 
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An appropriate set of densities of HuLEl on the column 
is (0.1 mg/ml, 0.5 mg/ml, 2,0 mg/ml, 8.0 mg/ml, 15.0 
mg/ml, and 30.0 mg/ml) . The Vy of {HuLEl} is, by 
hypothesis, 1.0 ml. The elution of LG7 phage is 
5 compared to the elution of LG10 on {HuLEl} having 
varying amounts of HuLEl affixed. The columns are 
eluted in a standard way: 

1) 10 mM KC1 buffered to pH 8.0 with phosphate, 
10 until optical density at 280nm falls to base line 

or 4 x V v , whichever is first, 

2) a gradient of 10 mM to 2 M KC1 in 3 x V v , pH 
held at 8.0 with phosphate, 

15 

3) a gradient of 2 M to 5 M KCl in 3 x V v , 
phosphate buffer to pH 8.0, 

4) constant 5 M KCl plus 0 to 0.8 M guanidinium CI 
20 in 2 x V v , with phosphate buffer to pH 8.0. 

The preferred level of induction (IPTGoptimai) and 
amount of affinity molecule on the matrix 
(DoAMoMQptjjj^) are those settings that give the 

25 sharpest LG7 elution peak that shows significant 
retardation as compared to LG8, which carries no BPTI. 
By hypothesis, the best separation occurs for the 
amount of BPTI/GP produced when the cells are induced 
with 10.0 uM IPTG and when 4.0 mg HuLEl/ml is applied 

30 to BioRad Affi-Gel 10 (™> . 

When the amount of BPTI/GP and the amount of 
HuLEl/volume of support have been optimized, we turn to 
optimization of elution rate, initial ionic strength, 
35 and the amount of GP/ (volume of support). These 
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parameters can be optimized separately. 



* 
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15 



20 
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Using optimal BPTI/GP and HuLEl/volume of support, 
we measure the elution volume of LG7 and LG8 for 
different elution rates, yi£. 1, 1/2, 1/4, 1/8 and 1/16 
times the maximum flow rate. By hypothesis, 1/4 of 
maximum elution rate is better than 1/2, but 1/8 is 
about the same as 1/4. Therefore 1/4 maximum elution 
rate will be used. 

Elution volumes of LG7 obtained from cells grown 
on media that is 2.0 mM in IPTG are measured at optimal 
DoAMoM and elution rate for loadings of 10 9 , 10 10 , 
10 11 , and 10 12 pfu. By hypothesis, 10 12 pfu of pure 
LG7 overloads the column and significant number of 
phage elute before their characteristic position in the 
KC1 gradient. We also find that 10 11 pfu overloads the 
column only slightly, and that 10 10 pfu does not 
overload the column. Because the use of the affinity 
separation in Sec. 15 will involve a population in 
which no single member is more than one part in 10 4 , we 
conclude that 10 12 pfu of a variegated population could 
be applied to a column of 1.0 ml matrix volume without 
overloading with respect any one species. The 
overloading of a 1.0 ml column by 10 12 pfu also 
indicates that the initial column that captures 
indiscriminately adhesive phage should be 5 to 10 times 
as large as the column that supports the target 
material . 

Elution volumes of LG7 and LG10 obtained from 
cells grown on media that is 2.0 mM in IPTG are 
measured at optimal conditions and for a loading of 
10 10 pfu for various initial ionic strengths: 1.0 mM, 
5.0 mM, 10.0 mM, 20.0 mM, and 50.0 mM. We may find, 
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for example, that LG10 is slightly retarded by the 
column when loaded at 1.0 mM KCl, but that LG7 always 
comes off the column at its characteristic place in the 
gradient. We use 10.0 mM as initial ionic strength in 
all remaining affinity separations. 

To determine the sensitivity of chromatography of 
phage that display variants of BPTI on their surfaces 
(Sec. 10.1), we prepare artificial mixtures of two 
closely- related phage that differ only at one residue 
in the BPTI domain. One variety of phage has strong 
affinity for the column used in this step, while the 
other phage has no affinity for the column. We 
chromatograph these mixtures to discover how little of 
the phage that binds to the column can be detected 
within a large majority of phage that do not bind the 
column. 

For these tests we choose AHTrp as AfM(BPTI) . A 
column having 2 ml bed volume is prepared with 
(DqAMoM optimal mg of AHTrp) / (ml of Affi-Gel 10 (™) ) . 
The column is called {AHTrp} and has Vy ■ 1.0 ml. 

A new phage, LG9, is prepared that displays 
BPTI(V15) as IPBD in contrast to LG7 that displays 
BPTI(K15, wild-type) as IPBD. Residue 15 of BPTI is 
residue 38 of the osp-ipbd gene. We introduce the 
change K38 to V by replacement of a short segment of 
the osp-ipbd gene between Apa I & Stu I. The correct 
construction is called pLG9 . To expedite 
differentiation between LG7 and an LG9-derivative 
phage, we replace the amp^ gene of LG9 with the tet R 
gene from pBR322. DNA from pBR322 between BsmE (1353, 
blunted) and Aatll (1428) is ligated to dsDNA from pLG9 
cut with Xbal (blunted) and Aat ll . The correct 
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construction, having 9.2 kb, is easily distinguished 
from pBR322 and is called LG11. DNA from phage LG11 is 
sequenced in the vicinity the junctions of the newly 
inserted tet R gene to confirm the construction. 

LG7 and LG11 are grown with optimum IPTG (2.0 mM) 
and harvested. Mixtures are prepared in the ratios 



10 



15 



20 



LG7:LG11 :: l^im 

where V lim ranges from 10 10 to 10 5 by factors of 10. 
Large values of are tested first; once a V lim is 

found that allows recovery of LG7, smaller values of 
v lim are not be tested ' 

The column {AHTrp} is first blocked by treatment 
with 10 11 virions of M13am429 in 100 ul of 10 mM KCl 
buffered to pH 8.0 with phosphate; the column is washed 
with the same buffer until OD 2 60 returns to base line 
or 4 x V v have passed through the column, whichever 
comes first. One of the mixtures of LG7 and LG11 
containing 10 12 pfu in 1 ml of the same buffer is 
applied to {AHTrp} . The column is eluted in a standard 
way : 

1) 10 mM KCl buffered to pH 8.0 with phosphate, 
until optical density at 280nm falls to base line 
or 4 x V v , whichever is first, (discard effluent) , 

30 2) a gradient of 10 mM to 2 M KCl in 3 x V v , pH 

held at 8.0 with phosphate, (30 x 100 ul 
fractions) , 



25 



35 



3) a gradient of 2 M to 5 M KCl in 3 x V v , 
phosphate buffer to pH 8.0, (30 x 100 ul 
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fractions) , 

4) constant 5 M KC1 plus 0 to 0.8 M guanidinium Cl 
in 2 x V v , with phosphate buffer to pH 8.0 r (20 x 

5 100 ul fractions) , 

5) constant 5 K KC1 plus 0.8 M guanidinium Cl in 
1.2 x V V/ with phosphate buffer to pH 8.0, (12 x 
100 ul fractions) . 

10 

Samples of 4 ul from each fraction are plated at 
suitable dilution on phage-sensitive Sup* cells (so 
that M13am429 will not grow) . A sample of the column 
15 matrix is also used as inoculum for phage-sensitive 
Sup + cells. Plaques are transferred to ampicillin- 
containing LB agar, and Amp R colonies are tested for 
display of BPTI(K15) by use of trp* or AHTrp*. 

20 By hypothesis, = 4.0 x 10 8 is the largest 

value for which LG7 can be recovered. Thus C sens i = 
4.0 x 10 8 . Three cycles of chromatography are required 
to isolate LG7, so the first approximation to C e ff is 
740 ( - exp( log e (4.0 x 10 8 )/3 ) ). 

25 

We now determine the efficiency of the affinity 
separation (Sec. 10.2). This is done by: a) preparing 
mixtures of LG7 and LG11 in the ratio 1:Q, b) enriching 
the population for LG7 for one separation cycle, and c) 
30 determining the fraction of LG7 in the last phage- 
bearing fraction. When Q is 1.5 x 10 4 , 3% of colonies 
are BPT1 positive. When Q is 1.5 x 10 3 , 60% of the 
colonies are BPTI positive. Thus we calculate C eff = 
.60 x 1.5 x 10 3 = 900. 
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Our hypothetical LG7 should display one or more 
BPTI domains on each virion. The osp-iobd gene is 
under control of the lacUV5 promoter so that expression 
levels of BPTI-M13 CP can be manipulated via [IPTG] . 
5 This construct may be used to develop many different 
binding proteins, all based on BPTI. An optimum level 
of induction and amount of AfM(PBD) (= DoAMoMoptinum = 
2.0 mg/(ml of support)) should have been determined; 
target molecules will be applied to columns in this 
10 amount in the process disclosed in Sec. 15.1. These 
optimum levels may be adequate for all targets and all 
variegations of BPTI displayed on derivatives of M13 
based on LG7, but some further optimization may be 
needed if other values of pH or temperatures are used. 

15 

Other • pbd gene fragments may be substituted for 
the boti gene fragment in pLG7 with a high likelihood 
that PBD will appear on the surface of the new LG7 
derivative. 

20 

Example l. Part III 

HHMb is chosen as a typical protein target; an 
other protein could be used. HHMb satisfies all of the 
25 criteria for a target: 1) it is large enough to be 
applied to an affinity matrix, 2) after attachment it 
is not reactive, and 3) after attachment there is 
sufficient unaltered surface to allow specific binding 
by PBDs. 

30 

The essential information for HHMb is known: 1) 
HHMb is stable at least up to 70°C, between pH 4.4 and 
9.3, 2) HHMb is stable up to 1.6 M Guanidinium CI, 3) 
the pi of HHMb is 7.0, 4) for HHMb, M r - 16,000, 5) 
35 HHMb requires haem, 6) HHMb has no proteolytic 
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activity • 

In addition, the following information about HHMb 
and other myoglobins is available: 1) the sequence of 
5 HHMb, 2) the 3D structure of sperm whale myoglobin 
(HHMb has 19 amino acid differences and it is generally 
assumed that the 3D structures are almost identical) , 
3) its lack of enzymatic activity, 4) its lack of 
toxicity. 

10 

We set the specifications of an SBD as : 
1) T = 25°C 
15 2) pH m 8.0 

3) Acceptable solutes : 

A ) for binding : 

i) phosphate, as buffer, 0 to 20 mM, and 
20 ii) KC1, 10 mM, 

B ) for column elution : 

i) phosphate, as buffer, 0 to 30 mM, 

ii) KC1, up to 5 M, and 

iii) Guanidinium CI, up to 0.8 M. 

25 

4) Acceptable < 1.0 x 10~ 8 M. 
We choose LG7 as GP(IPBD) . 

30 Residues to be varied are picked, in part, through 

the use of interactive computer graphics to visualize 
'the structures. In this section, all residue numbers 
refer to BPTI. We pick a set of residues that forms a 
surface such that all residues can contact one target 

35 molecule • Information relevant to choosing BPTI 
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residues to vary includes: 1) the 3D structure, 2) 
solvent accessibility of each residue (LEEB71) , 3) a 
compilation of sequences of other proteins homologous 
to BFTI, and 4) knowledge of the structural nature of 
5 different amino acid types. 

Tables 16 and 34 indicate which residues of BPTI: 
a) have substantial surface exposure, and b) are known 
to tolerate other amino acids in other closely related 

10 proteins. We use interactive computer graphics to pick 
sets of eight to twenty residues that are exposed and 
variable and such that all members of one set can touch 
a molecule of the target material at one time. If BPTI 
has a small amino acid at a given residue, that amino 

15 acid may not be able to contact the target 
simultaneously with all the other residues in the 
interaction set, but a larger amino acid might well 
make contact. A charged amino acid might affect 
binding without making direct contact. In such cases, 

20 the residue should be included in the interaction set, 
with a notation that larger residues might be useful. 
In a similar way, large amino acids near the geometric 
center of the interaction set may prevent residues on 
either side of the large central residue from making 

25 simultaneous contact. If a small amino acid, however, 
were substituted for the large amino acid, then the 
surface would become flatter and residues on either 
side could make simultaneous contact. Such a residue 
should be included in the interaction set with a 

30 notation that small amino acids may be useful. 

Table 35 was prepared from standard model parts 
and shows the maximum span between C^eta and tilB t ^P of 
each type of side group. ^beta. is used because it is 
35 rigidly attached to the protein main-chain; rotation 
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about the C^p^a-Cketa bond is the most important 
degree of freedom for determining the location of the 
side group. 

5 Table 34 indicates five surfaces that meet the 

given criteria. The first surface comprises the set of 
residues that contacts trypsin in the complex of 
trypsin with BPTI as reported in the Brookhaven Protein 
Data Bank entry "ITPA". This set is indicated by the 
10 number n l". The exposed surface of the residues in 
this set (taken from Table 16) totals 1148 A 2 and the 
approximates the area of contact between BPTI and 
trypsin. 

15 Other surfaces, numbered 2 to 5, were picked by 

first picking one exposed, variable residue and then 
picking neighboring residues until a surface was 
defined. The choice of sets of residues shown in Table 
34 is in no way exhaustive or unique; other sets of 

20 variable, surface residues can be picked. Hereinafter 
we refer to K15 as being at the top of the molecule, 
while the carboxy and amino termini are at the bottom. 

Solvent accessibilities are useful, easily 
25 tabulated indicators of a residue 1 s exposure. Solvent 
accessibilities must be used with some caution; small 
amino acids are under-represented and large amino acids 
over-represented. The user must consider what the 
solvent accessibility of a different amino acid would 
30 be when substituted into the structure of BPTI. 

To create specific binding between a derivative of 
BPTI and HHMb, we will vary the residues in set #2. 
This set includes the twelve principal residues 17 (R) , 
35 19(1), 21(Y), 27(A), 28(G), 29(L), 31(Q) , 32(T), 34(V), 



WO 90/02809 




PCT/US89/03731 



157 

48(A), 49(E), and 52 (M) (Sec* 13.1.1). None of the 
residues in set #2 is completely conserved in the 
sample of sequences reported in Table 34; thus we can 
vary them with a high probability of retaining the 
5 underlying structure. Independent substitution at each 
of these twelve residues of the amino acid types 
observed at that residue would produce approximately 
4.4 x 10 9 amino acid sequences and the same number of 
surfaces . 

10 

BPTI is a very basic protein- This property has 
been used in isolating and purifying BPTI and its 
homologues so that the high frequency of arginine and 
lysine residues may reflect bias in isolation and is 
15 not necessarily required by the structure. Indeed, 
SCI-III from Bombvx mori contains seven more acidic 
than basic groups (SASA84) . 

Residue 17 is highly variable and fully exposed 
20 and can contain R, K, A, Y, H, F, L, M, T, G, Y, P, or 
S. All types of amino acids are seen: large, small, 
charged, neutral, and hydrophobic. That no acidic 
groups are observed may be due to bias in the sample. 

25 Residue 19 is also variable and fully exposed, 

containing P, R, I, S, K, Q, and L. 

Residue 21 is not very variable, containing F or Y 
in 31 of 33 cases and I and W in the remaining cases. 

30 The side group of Y21 fills the space between T32 and 
the main chain of residues 47 and 48. The OH at the 
tip of the Y side group projects into the solvent. 
Clearly one can vary the surface by substituting Y or F 
so that the surface is either hydrophobic or 

35 hydrophilic in that region. It is also possible that 
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acid f viz. H) or the other 
might be tolerated. 



Residue 27 most often contains A, but S, K, L, and 
5 T are also observed. On structural grounds, this 
residue will probably tolerate any hydrophilic amino 
acid and perhaps any amino acid. 

Residue 28 is G in BPTI. This residue is in a 
10 turn, but is not in a conformation peculiar to glycine. 
Six other types of amino acids have been observed at 
this residue: K, N, Q, R, H, and N. Small side groups 
at this residue might not contact HHMb simultaneously 
with residues 17 and 34. Large side groups could 
15 interact with HHMb at the same time as residues 17 and 
34. Charged side groups at this residue could affect 
binding of HHMb on the surface defined by the other 
residues of the principal set. Any amino acid, except 
perhaps P, should be tolerated. 

20 

Residue 2 9 is highly variable, most often 
containing L. This fully exposed position will 
probably tolerate almost any amino acid except, 
. perhaps, P. 

25 

Residues 31, 32, and 34 are highly variable, 
exposed, and in extended conformations; any amino acid 
should be tolerated. 

30 Residues 48 and 49 are also highly variable and 

fully exposed, any amino acid should be tolerated. 

Residue 52 is in an alpha helix. Any amino acid, 
except perhaps P, might be tolerated. 



35 
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Now we consider possible variation of the 
secondary set (Sec. 13. 1.2) of residues that are in the 
neighborhood of -the principal set. Neighboring 
residues that might be varied at later stages include 
5 *9(P), 11 (T), 15(H), 16(A), 18(1), 20 (R) , 22(F), 24 (N) , 
26 (K) , 35 (Y) , 47 (S), 50(D), and 53 (R). 

Residue 9 is highly variable, extended, and 
. exposed. Residue 9 and residues 48 and 49 are 
10 separated by a bulge caused by the ascending chain from 
residue 31 to 34. For residue 9 and residues 48 and 49 
to contribute simultaneously to binding, either the 
target must have a groove into which the chain from 31 
to 34 can fit, or all three residues (9, 48, and 49) 
15 must have large amino acids that effectively reduce the 
radius of curvature of the BPTI derivative. 

Residue 11 is highly variable, extended, and 
exposed. Residue 11, like residue 9, is slightly far 
20 from the surface defined by the principal residues and 
will contribute to binding in the same circumstances . 

Residue 15 is highly varied. The side group of 
residue 15 points away form the face defined by set #2. 
25 Changes of charge at residue 15 could affect binding on 
the surface defined by residue set #2. 

Residue 16 is varied but points away from the 
surface defined by the principal set. Changes in 
30 charge at this residue could affect binding on the face 
defined by set #2, 

Residue 18 is I in BPTI. This residue is in an 
extended conformation and is exposed. Five other amino 
35 acids have been observed at this residue: M, F, L, V, 
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and T. Only T is hydrophilic. The side group points 
directly away from the surface defined by residue set 
#2. Substitution of charged amino acids at this 
residue could affect binding at surface defined by 
5 residue set #2. 

Residue 20 is R in BPTI. This residue is in an 
extended conformation and is exposed. Four other amino 
acids have been observed at this residue: A f S, L, and 
10 Q. The side group points directly away from the 
surface defined by residue set #2. Alteration of the 
charge at this residue could affect binding at surface 
defined by residue set #2. 

15 Residue 22 is only slightly varied, being Y, F, or 

H in 30 of 33 cases. Nevertheless, A, N, and S have 
been observed at this residue. Amino acids such as L, 
M f I, or Q could be tried here. Alterations at residue 
22 may affect the mobility of residue 21; changes in 

20 charge at residue 22 could affect binding at the 
surface defined by residue set #2. 

Residue 24 shows some variation, but probably can 
not interact with one molecule of the target 
25 simultaneously with all the residues in the principal 
set. Variation in charge at this residue might have an 
effect on binding at the surface defined by the 
principal set. 

30 Residue 26 is highly varied and exposed. Changes 

in charge may affect binding at the surface defined by 
residue set #2? substitutions may affect the mobility 
of residue 27 that is in the principal set. 

35 Residue 35 is most often Y, W has been observed. 
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The side group of 35 is buried, but substitution of F 
or W could affect the mobility of residue 34. 

Residue 47 is always T or S in the sequence sample 



the NH of residue 50 in the alpha helix. Nevertheless, 
there is no overwhelming steric reason to preclude 
other amino acid types at this residue. In particular, 
other amino acids the side groups of which can accept 
hydrogen bonds, viz. N, D, Q, and E, may be acceptable 
here. 

Residue 50 is often an acidic amino acid, but 
other amino acids are possible. 

Residue 53 is often R, but other amino acids have 
been observed at this residue. Changes of charge may 
affect binding to the amino acids in interaction set 



From published models (HUBE77, WLOD84) one can see 
that R39 is on the opposite side of BPTI from the 
surface defined by the residues in set #2. Therefore, 
variation at residue 39 at the same time as variation 
of some residues in set #2 is much less likely to 
improve binding that occurs along surface #2 than is 
variation of the other residues in set #2. 

In addition to the twelve principal residues and 
13 secondary residues, there are two other residues, 
30(C) and 33(F), involved in surface #2 that we will 
probably not vary, at least not until late in the 
procedure. These residues have their side groups 
buried inside BPTI and are conserved. Changing these 
residues does not change the surface nearly so much as 



used* The 0 ( 
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probably accepts a hydrogen bond from 
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does changing residues in the principal set. These 
buried, conserved residues do f however, contribute to 
the surface area of surface #2. The surface of residue 
set #2 is comparable to the area of the trypsin-binding 
5 surface. Principal residues 17, 19, 21, 27, 28, 29, 
31, 32, 34, 48, 49, and 52 have a combined solvent- 
accessible area of 946.9 £ 2 • Secondary residues 9, 11, 
15, 16, 18, 20, 22, 24, 26, 35, 47, 50, and 53 have 
combined surface of 1041.7 fi2. Residues 30 and 33 have 
10 exposed surface totaling 38.2 8 2 . Thus the three 
groups' combined surface is 2026.8 & 2 . 

Residue 30 is C in BPTI and is conserved in all 
homologous sequences. It should be noted, however, 
15 that C14/C38 is conserved in all natural sequences, yet 
Marks et al. (MARK87) showed that changing both C14 and 
C38 to A, A or T,T yields a functional trypsin 
inhibitor. Thus it is , possible that BPTI-like 
molecules will fold if C30 is replaced. 

20 

Residue 33 is F in BPTI and in all homologous 
sequences. Visual inspection of the BPTI structure 
suggests that substitution of Y, M, H, or L might be 
tolerated. 

25 

Given our hypothetical affinity separation 
sensitivity, C sens i, we decide to vary six residues 
leaving some margin for errors in the actual base 
composition of variegated bases. To obtain maximal 

30 recognition, we choose residues from the principal set 
that are as far apart as possible. Table 36 shows the 
distances between the beta carbons of residues in the 
principal and peripheral set. R17 and V34 are at one 
end of the principal surface. Residues A27, G28, L29, 

35 A48, E49, and M52 are at the other end, about twenty 
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Angstroms away; of these, we will vary residues 17 , 27, 
29, 34, and 48. Residues 28, 49, and 52 will be varied 
at later rounds. 

5 Of the remaining principal residues, 21 is left to 

later variations. Among residues 19, 31, and 32, we 
arbitrarily pick 19 to vary. 

Unlimited variation of six residues produces 6.4 x 
10 10 7 amino acid sequences. By hypothesis, C sens i is 1 
in 4 x 10 8 . Table 37 shows the programmed variegation 
at the chosen residues. The parental sequence is 
present as 1 part in 5.5 x 10 7 , but the least favored 
sequences are present at only 1 part in 4.2 x 10 9 . 
15 Among single-amino-acid substitutions from the PPBD, 
the least favored is F17-I19-A27-L29-V34-A48 and has a 
calculated abundance of 1 part in 1.6 x 10 8 . Using the 
optimal qfk codon, we can recover the parental sequence 
and all one-amino-acid substitutions to the PPBD if 
20 actual nt compositions come within 5% of programmed 
compositions. The number of transf ormants is M ntv = 
1.0 x 10 9 (also by hypothesis), thus we will produce 
most of the programmed sequences. 

25 The residue numbers above refer to mature BPTI. 

Since Table 25 refers to the pre-M13CP-BPTI protein, 
all mature BPTI sequence numbers have been increased by 
the length of the signal sequence, 23. Thus, we wish 
to vary residues 40, 42, 50, 52, 57, and 71. A DNA 

30 subsequence containing all these codons is found 
between the (AoalV sites at base 191 and the SphI site 
at base 3 09 of the oso-pbd gene. Among Apa l, Drall, 
and PssI/ Apa l is preferred because it recognizes six 
bases without any ambiguity and will cut fewer 

35 sequences in the vgDNA. Gratuitous restriction sites 
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can be avoided in some cases by use of codon ambiguity: 
changing the codon for g5l from GGC to GGT makes it 
impossible to generate an Apa l site at codons 50, 51, 
and 6=52 . 

5 

Each piece of dsDNA to be synthesized needs six to 
eight bases added at either end to allow cutting with 
restriction enzymes and is shown in Table 37. The 
first synthetic base (before cutting with Aoa l and 

10 Sph I) is 184 and the last is 322, There are 142 bases 
to be synthesized. The center of the piece to the 
synthesized lies between Q54 and V57. The overlap can 
not include varied bases, so we choose bases 245 to 256 
as the overlap that is 12 bases long. Note that the 

15 codon for F56 has been changed to TTC to increase the 
GC content of the overlap. The amino acids that are 
being varied are marked as X with a plus over them. 
Codons 57 and 71 are synthesized on the sense (bottom) 
strand. The design calls for M qfk" in the antisense 

20 strand, so that the sense strand contains (from 5 1 to 
3 1 ) a) equal part C and A ( i.e. the complement of k) , 
b) (0.40 T, 0.22 A, 0.22 C, and 0.16 G) f i.e. the 
complement of f ) , and c) (0.26 T, 0.26 A, 0.30 C, and 
0.18 G) . 

25 

Each residue that is encoded by "qfk" has 21 
possible outcomes, each of the amino acids plus stop. 
Table 12 gives the distribution of amino acids encoded 
by "qfk", assuming 5% errors. The abundance of the 
30 parental sequence is the product of the abundances of R 
xIxAxLxVxA. The abundance of the least- 
favored sequence is 1 in 4.2 x 10 9 . 

Olig#27 and olig#28 are annealed and extended with 
35 Klenow fragment and all four (nt)TPs. Both the ds 
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synthetic DNA and RF pLG7 DNA are cut with both Aga I 
and S£h I. The cut DNA is purified and the appropriate 
pieces ligated (See Sec. 14.1) and used to transform 
competent PE383. (Sec. 14.2). In order to generate a 
5 sufficient number of trans formants, we start with 5.0 1 
of cells m 

1) culture L. coli in 5.0 1 of LB broth at 37°C 
until cell density reaches 5 x 10 7 to 7 x 10 7 

10 cells/ml , 

2) chill on ice for 65 minutes, centrifuge the 
cell suspension at 4000g for 5 minutes at 4°C, 

15 3) discard supernatant; resuspend the cells in 

1667 ml of an ice-cold, sterile solution of 60 
mM CaCl2/ 

4) chill on ice for 15 minutes, and then 
20 centrifuge at 4000g for 5 minutes at 4°C, 

5) resuspend cells in 2 x 400 ml of ice-cold, 
sterile 60 mM CaCl 2 ; store cells at 4°C for 24 
hours , 

25 

6) add DNA (100 nq) in 20 ml of litigation or 
TE buffer; mix, inculafe on ice for minutes, 

7) distribute into 200 nl aliquots and heat 
30 shock cells at 42°C for 20 seconds, 

8 ) add 2 00 ml LB broth and incubate at 37°C for 
1 hour,* 



35 



9) add the culture to 2.0 1 of LB broth 
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containing ampicillin at 35-100 ug/ml and 
culture overnight at 37°C, 



10) after 6 hours, remove 200 ml and plate 0.5 
5 ml portions with log phase JM 107 on LB agar, 

using the soft-agar overlay technique. Phage 
are prepared from the soft agar, 

11) centrifuge the overnight culture to remove 
10 cells, and pellet phage (MESS83), 

12) harvest virions by method of Salivar, et 
al. (SALI64). 



15 

It is important to: a) use all or nearly all the 
vgDNA synthesized in ligation, b) use all or nearly all 
the ligation mixture to transform cells, and c) culture 
all or nearly all the' trans formants. These measures 
20 are directed at maintaining diversity. 

It is important to collect virions in a way that 
samples all or nearly all the trans formants. Because 
F~ cells are used in the transformation, multiple 
25 infections do not pose a problem in the overnight phage 
production. F 1 cells are used for phage production in 
agar. 

HHMb has a pi of 7.0 and we carry out 
30 chromatography at pH 8.0 so that HHMb is slightly 
negative while BPTI and most of its mutants are 
positive. HHMb is fixed (Sec. 15.1) to a 2.0 ml column 
on Affi-Gel 10 (™) or Affi-Gel 15 (™) at 4.0 mg/ml 
support matrix, the same density that is optimal for a 
35 column supporting trp. 
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To remove variants of BPTI with strong, 
indiscriminate binding for any protein or for the 
support matrix (Sec. 15.2), we pass the variegated 
population of virions over a column that supports 
bovine serum albumin (BSA) before loading the 
population onto the {HHMb} column. Affi-Gel 10 (™) or 
Affi-Gel 15 (™) is used to immobilize BSA at the 
highest level the matrix will support. A 10.0 ml 
column is loaded with 5.0 ml of Af f i-Gel-linked-BSA; 
this column, called {BSA}, has V v - 5.0 ml. The 
variegated population of virions containing 10 12 pfu in 
1 ml (0.2 x V v ) of 10 mM KC1, 1 mM phosphate, pH 8.0 
buffer is applied to {BSA}. We wash {BSA} with 4.5 ml 
(0.9 x V v ) of 50 mM KCl, 1 mM phosphate, pH 8.0 buffer. 
The wash with 50 mM salt will elute virions that adhere 
slightly to BSA but not virions with strong binding. 
The pooled effluent of the {BSA} column is 5.5 ml of 
approximately 13 mM KCl. 

The column {HHMb} is first blocked by treatment 
with 10 11 virions of M13(am429) in 100 ul of 10 mM KCl 
buffered to pH 8.0 with phosphate; the column is washed 
with the same buffer until OD 2 60 returns to base line 
or 2 x V v have passed through the column, whichever 
comes first. The pooled effluent from {BSA} is added 
to {HHMb} in 5.5 ml of 13 mM KCl, 1 mM phosphate, pH 
8.0 buffer. The column is eluted (Sec. 15.3) in the 
following way: 

1) 10 mM KCl buffered to pH 8.0 with phosphate, 
until optical density at 280nm falls to base line 
or 2 x V v , whichever is first, (effluent 
discarded) , 



35 
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2) a gradient of 10 mM to 2 M KCl in 3 x V v , pH 
held at 8.0 with phosphate, (30 x 100 /xl 
fractions) , 

3) a gradient of 2 M to 5 M KCl in 3 x V v , 
phosphate buffer to pH 8.0 (30 x 100 /il 
fractions) , 



4) constant 5 M KCl plus 0 to 0.8 M guanidinium Cl 
in 2 x V V/ with phosphate buffer to pH 8.0, (20 x 
100 fil fractions) , and 

5) constant 5 M KCl plus 0.8 M guanidinium Cl in 1 
x V v , with phosphate buffer to pH 8.0, (10 x 100 
til fractions) . 

In addition to the elution fractions, a sample is 
removed from the column and used as an inoculum for 
phage-sensitive Sup + cells (Sec. 15.4). A sample of 4 
/il from each fraction is plated on phage-sensitive Sup + 
cells. Fractions that yield too many colonies to count 
are replated at lower dilution. An approximate titre 
of each fraction is calculated. Starting with the last 
fraction and working toward the first fraction that was 
titered, we pool fractions until approximately 10 9 
phage are in the pool, i.e. about 1 part in 1000 of the 
phage applied to the column. This population is 
infected into 3 x 10 11 phage-sensitive PE384 in 300 ml 
of LB broth. The low multiplicity of infection is 
chosen to reduce the possibility of multiple infection. 
After thirty minutes, viable phage have entered 
recipient cells but have not yet begun to produce new 
phage. Phage-born genes are expressed at this phase, 
and we can add ampicillin that will kill uninfected 
cells. These cells still carry F-pili and will absorb 
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phage helping to prevent multiple infections. 

If multiple infection should pose a problem that 
cannot be solved by growth at low multiple-of- 
5 infection on F + cells, the following procedure can be 
employed to obviate the problem. Virions obtained from 
the affinity separation are infected into F+ JL. coli 
and cultured to amplify the genetic messages (Sec. 
15.5). CCC DNA is obtained either by harvesting RF DNA 
10 or by in vitro extension of primers annealed to ss 

phage DNA. The CCC DNA is used to transform F~ cells 
at a high ratio of cells to DNA. Individual virions 
obtained in this way should bear proteins encoded only 
by the DNA within. 

15 

The variegation produces as many as 6.4 x 10 7 
different amino-acid sequences. C eff is 900. Thus, 
after two separation cycles, the probability of 
isolating a single SBD is less than 0.10; after three 
20 cycles, the probability rises above 0.10. 

The phagemid population is grown and 
chromatographed three times and then examined for SBDs 
(Sec. 15.7). In each separation cycle, phage from the 

25 last three fractions that contain viable phage are 
pooled with phage obtained by removing some of the 
support matrix as an inoculum. At each cycle, about 
10 12 phage are loaded onto the column and about 10 9 
phage are cultured for the next separation cycle. 

30 After the third separation cycle, 32 colonies are 
picked from the last fraction that contained viable 
phage; phage from these colonies are denoted SBD1, 
SBD2 , . . . , and SBD32 . 

35 Each of the SBDs is cultured and tested for 
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retention on a Pep-Tie column supporting HHMb (Sec. 
15.8). Phage LG7(SBD11) shows the greatest retention 
on the Pep-Tie {HHMb} column, eluting at 367 mM KC1 
while wtM13 elutes at 20 mM KC1. SBD11 becomes the 
5 parental amino-acid sequence to the second variegation 
cycle. 

The result of this hypothetical experiment is ^ 
shown in Table 38. R40 changed to D f 142 changed to 
10 Q, A50 changed to E f L52 remained L, and A71 changed to 
W. 



The next round of variegation (Sec. 16) is 
illustrated in Table 39. The residues to be varied are 

15 chosen by: a) choosing some of the residues in the 

principal set that were not varied in the first round 
f viz. residues 42, 44, 51, 54, 55, 72, or 75 of the 
fusion) , and b) choosing some residues in the secondary 
set. Residues 51, 54, 55, and 72 are varied through 

20 all twenty amino acids and, unavoidably, stop. Residue 
44 is only varied between Y and F. Some residues in 
the secondary set are varied through a restricted 
range? primarily to allow different charges (+, 0, -) 
to appear. Residue 38 is varied through K, R, E, or G. 

25 Residue 41 is varied through I, V, K, or E. Residue 43 
is varied through R, S, G, N, K, D, E, T, or A. 

01ig#29 and olig#30 are synthesized, annealed, 
extended and cloned into pLG7 at the Aoa I/ Sph I sites. 

30 The ligation mixture is used to transform 5 1 of 

competent PE383 cells so that 10 9 transformants are 
obtained. A new {HHMb) is constructed using the same 
support matrix as was used in round 1. A sample of 
10 12 of the harvested LG7 are applied to {HHMb} and 

35 affinity separated. The last 10 9 phage off the column 
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and an inoculum are pooled and cultured. The cultured 
phagemids are re-chromatographed for three separation 
cycles. Thirty-two clonal isolates (denoted SBD11-1, 
SBD11-2,..., SBD11-32) are obtained from the effluent 
5 of the third separation cycle and tested for binding on 
a Pep-Tie {HHMb} column. Of this set, SBD11-23 shows 
the greatest retention on the Pep-Tie {HHMb} column, 
eluting at 692 mM KC1. 

10 The results of this hypothetical selection is 

shown in Table 40. Residue 38 (K15 of BPTI) changed to 

E, 41 becomes V, 43 goes to N, 44 goes to F, 51 goes to 

F, 54 goes to S, 55 goes to A, and 72 goes to Q. 

IS The sbdii-23 portion of the osp-pbd gene is cloned 

into an expression vector and BPTI(E15, D17, V18, Q19, 
N20, F21, E27, F28, L29, S31, A32, S34, W71, Q72) is 
expressed in the periplasm. This protein is isolated 
by standard methods and its binding to HHMb is tested. 

20 K d is found to be 4.5 x 10~ 7 M. 

A third round of variation, using SBD11-23 as 
PPBD, is illustrated in Table 41; eight amino acids are 
varied. Those in the principal set, residues 40, 55, 

25 and 57, are varied through all twenty amino acids. 
Residue 32 is varied through P, Q, T, K, A, or E. 
Residue 34 is varied through T, P, Q, K, A, or E. 
Residue 44 is varied through F, L, Y, C, W, or stop. 
Residue 50 is varied through E, K, or Q. Residue 52 is 

30 varied through L, F, I, M, or V. 

The result of this variation is shown in Table 42. 
The selected SBD is denoted SBD11-23-5 and elutes from 
a Pep-Tie {HHMb} column at 980 mM KC1. The sbdll-23-5 
35 segment is cloned into an expression vector and 
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BPTI(E9, Qll, E15, A17, V18, Q19, N20, W21, Q27, F28, 
M29, S31, L32, H34, W71, Q72) is produced. This time 
the % is 7.3 x 1CT 9 M. 

5 This example is hypothetical. It is anticipated 

that more variegation cycles will be needed to achieve 
dissociation constants of 10" 8 M. It is also possible 
that more than three separation cycles will be needed 
in some variegation cycles. Real DNA chemistry and DNA 
10 synthesizers may have larger errors than our 

hypothetical 5%. If S err > 0.05, then we may not be 
able to vary six residues at once. Variation of 5 
residues at once is certainly possible. 
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•Table 2: Preferred Outer-Surface Proteins 



Genetic 
Package 



Preferred 
Outer-Surface 
Protein 



Reason for preference 



M13 



coat protein a) exposed amino terminus, 
(gpVIII) b) predictable post- 

translational 

processing, 
c) numerous copies in 
virion . 



qp HI 



aV fusion data available. 



PhiX174 G protein a) known to be on virion 

exterior, 
b) small enough that 
the G-ipbd gene can 

; replace H gene, 



E. coli LamB a) fusion data available, 
b) non-essential. . 

B^. subtilis CotC a) no post-translational 

spores processing, 

b) distinctive sdequence 
that causes protein to 
localize in spore coat, 
c) non-essential. 
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Table 7: Atomic radii 
Angstroms 

c alpha 

°carbonyl 

N amide 
Other atoms 



Table 8 

Fraction of DNA molecules having 
5 n non-parental bases when 

reagents that have fraction 
M of parental nt. 



1.70 
1.52 
1.55 
1.80 



M 


.9965 


.97716 


.92612 


.8577 


.79433 


.63096 


fO 


.9000 


.5000 


.1000 


.0100 


.0010 


.000001 


fl 


.09499 


.35061 


.2393 


.04977 


.00777 


.0000175 


f2 


.00485 


.1188 


.2768 


.1197 


.0292 


.000149 


f3 


.00016 


. 0259 


.2061 


.1854 


.0705 


.000812 


f 4 . 


000004 


.00409 


.1110 


.2077 


.1232 


.003207 


f8 


0. 


2xl0~ 7 


.00096 


.0336 


.1182 


.080165 


fl6 


0. 


0. 


0. 


5X10" 7 


.00006 


.027281 


f23 


0. 


0. 


0. 


0. 


0. 


.0000089 


most 0 


0 


2 


5 


7 


12 



"most" is the value of n having the highest 
probability. 

25 



WO 90/02809 



PCT/US89/03731 



207 

Table 9: best vgCodon 

Program "Find Optimum vgCodon." 
5 INITIALIZE-MEMORY-OF-ABUNDANCES 

DO ( tl = 0.21 to 0.31 in steps of 0.01 ) 
. DO ( cl * 0.13 to 0.23 in steps of 0.01 ) 
. . DO ( al = 0.23 to 0.33 in steps of 0.01 ) 
Comment calculate gl from other concentrations 
10 . . . gl = 1.0 - tl - cl - al 

. . . IF( gl .ge. 0.15 ) 

. . . . DO ( a2 = 0.37 to 0.50 in steps of 0.01 ) 

DO ( c2 = 0.12 to 0.20 in steps of 0.01 ) 

Comment Force D+E - R + K 

15 g2 = (gl*a2 -.5*al*a2)/ (cl+o.5*al) 

Comment Calc t2 from other concentrations. 

t2 = 1. - a2 - c2 - g2 

IF(g2.gt. O.l.and. t2.gt.0.1) 

....... CALCULATE -ABUNDANCES 

20 COMPARE -ABUNDANCES -TO-PREVIOUS -ONES 

end_IF_block 

end_DO_loop ! c2 

end_DO_loop 1 a2 

end_IF_block ! if gl big enough 

25 . . . . end_DO_loop I al 

. . .end_D0_loop 1 cl 

. . end_DO_loop ! tl 

WRITE the best distribution and the abundances. 
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Table 10: Abundances obtained 
from optimum vgCodon 

Amino Amino 



20 



dClQ 


AJjunaa.ncB 


aula 


Ajjiinaance 


A 


4.80% 


c 


2.86% 


D 


6.00% 


E 


6.00% 


F 


2.86% 


G 


6.60% 


H 


3.60% 


I 


2.86% 


K 


5.20% 


L 


6.82% 


M 


2.86% 


N 


5.20% 


P 


2.88% 


Q 


3.60% 


R 


6.82% 


s 


7.02% mfaa 


T 


4.16% 


V 


6.60% 


W 


2.86% lfaa 


Y 


5.20% 


stop 


5.20% 






ratio 


= Abun(W)/Abun(S) 


= 0.4074 





25 



i 


(Vratio)j 


(ratio) j 


stop-free 


i 


2.454 


.4074 


.9480 


2 


6.025 


.1660 


.8987 


3 


14.788 


.0676 


.8520 


4 


36.298 


. 0275 


.8077 


5 


89.095 


.0112 


.7657 


6 


218.7 


4.57 X 10~ 3 


.7258 


7 


536.8 


1.86 X 10~ 3 


.6881 


lfaa 


= least - favored 


amino-acid 




mfaa 


= most - favored 


amino-acid 
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Table 11: Calculate worst codon. 



Program "Find worst vgCodon within Serr of given 
5 distribution." 

INITIALIZE-MEMORY-OF-ABUNDANCES 
Comment Serr is % error level. 
READ Serr 

Comment Tli,Cli,Ali, Gli, T2i,C2i,A2i,G2i, T3i,G3i 
10 Comment are the intended nt-distribution. 
READ Tli, Cli, Ali, Gli 
READ T2i, C2i, A2i, G2i 
READ T3i, G3i 
Fdwn = l.-Serr 
15 Fup = l.+Serr 

DO ( tl - Tli*Fdwn to Tli*Fup in 7 steps) 
. DO ( cl « Cli*Fdwn to Cli*Fup in 7 steps) 
. . DO ( al = Ali*Fdwn to Ali*Fup in 7 steps) 
gl = 1. - tl - cl - al 
20 ... IF ( (gl-Gli)/Gli .It. -Serr) 

Comment gl too far below Gli, push it back 

• gl = Gli*Fdwn 

. factor - (l.-gl)/(tl + cl + al) 
. tl = tl*factor 
25 . . . . cl = cl*factor 

. al = al*factor 
. • end_IF_block 
IF( (gl-Gli)/Gli .gt. Serr) 
Comment gl too far above Gli, push it back 
30 . . . . gl » Gli*Fup 

• factor - (l.-gl)/(ti + cl + al) 
. tl = tl*factor 

• cl « cl*factor 
. al = al*factor 

35 end IF block 
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Table 11, continued. 



. . . DO ( a2 = A2i*Fdwn to A2i*Fup in 7 steps) 
5 Table 11 , continued. 

. . . . DO ( c2 = C2i*Fdwn to C2i*Fup in 7 steps) 

DO (g2=G2i*Fdwn to G2i*Fup in 7 steps) 

Comment calc t2 from other concentrations. 

10 t2 = 1. - a2 - c2 - g2 

IF( (t2-T2i)/T2i .It. -Serr) 

Comment t2 too far below T2i, push it back 

. t2 » T2i*Fdwn 

factor - (l.-t2)/(a2 + c2 + g2) 

15 ....... a2 = a2*f actor 



c2 ■> c2* factor 

g2 = g2* factor 

end_IF_block 

IF( (t2-T2i)/T2i .gt. Serr) 

20 Comment t2 too far above T2i, push it back 
t2 = T2i*Fup 

factor = (l.-t2)/(a2 + c2 + g2) 

Table 11, continued. 

25 a2 - a2*factor 

c2 bb c2*factor 

g2 = g2* factor 

end_IF_block 

IF(g2.gt. 0.0 .and. t2.gt.0.0) 



30 . t3 ■ 0.5*(l.-serr) 

g3 = l. - t3 

CALCULATE -ABUNDANCES 

....... COMPARE -ABUNDANCES -TO-PREVIOUS -ONES 

. t3 = 0.5 

35 . . . . . . . g3 = 1. - t3 
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Table 11 , continued. 



CALCULATE -ABUNDANCES 

COMPARE -ABUNDANCES -TO-PREVTOUS -ONES 

5 t3 = 0.5*(1.+Serr) 

g3 = 1. - t3 

CALCULATE-ABUNDANCES 

Table 11, continued* 



10 COMPARE -ABUNDANCES -TO- PREVIOUS -ONES 

. . . end_IF_block 

end_DO_loop 1 g2 

end_DO_loop ! c2 

. . . . . end_DO_loop 1 a2 
15 .... end_DO_loop I al 

. . . end_DO_loop ! cl 

. . end_DO_loop ! tl 

WRITE the WORST distribution and the abundances. 
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Table 12: Abundances obtained 
using optimum vgCodon assuming 
5% errors 

Amino Amino 

acid Abundance acid Abundance 

A 4.59% C 2.76% 

D 5.45% E 6.02% 

F 2.49% lfaa 6 6.63% 

H 3.59% I 2.71% 

K 5.73% L 6.71% 

M 3.00% N 5.19% 

P 3.02% Q 3.97% 

R 7.68% mfaa S 7.01% 

T 4.37% V • 6.00% 

W 3.05% Y 4.77% 

stop 5.27% 



ratio = Abun(F)/Abun(R) = 0.3248 



i ( 1/ratio) J (ratio) 3 stop-free 

1 3.079 .3248 .9473 

• 2 9.481 .1055 .8973 

3 29.193 .03425 .8500 

4 89.888 .01112 .8052 

5 276.78 3.61 X 10~ 3 .7627 

6 852.22 1.17 X 10" 3 .7225 

7 2624.1 3.81 X 10~ 4 .6844 



9 
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Table 13: BPTI Homologues 



R # 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


13 


14 


15 


16 


17 


18 


19 


-3 








































-2 


_ 


mm 


— 


Q 


T 


— 


— 


— 


- 


— 


- 


Q 


- 


- 


- 


H 


G 


Z 


— 


-1 


— 


— 


— 


T 


E 


— 


— 


— 


— 


— 


- 


P 


- 


- 


- 


D 


D 


G 


- 


1 


R 


R 


R 


P 


R 


R 


R 


R 


R 


R 


R 


L 


A 


R 


R 


R 


K 


R 


A 


2 


P 


P 


P 


P 


P 


P 


P 


P 


P 


P 


P 


R 


A 


P 


P 


P 


R 


P 


A 


3 


D 


D 


D 


D 


D 


D 


D 


D 


D 


D 


D 


K 


K 


D 


R 


T 


D 


S 


K 


4 


F 


F 


F 


L 


F 


F 


F 


F 


F 


F 


F 


L 


Y 


F 


F 


F 


I 


F 


Y 


5 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


6 


L 


L 


L 


Q 


L 


L 


L 


L 


L 


L 


L 


I 


K 


E 


E 


N 


R 


N 


K 


7 


E 


E 


E 


L 


E 


E 


E 


E 


E 


E 


E 


L 


L 


L 


L" 


L 


L 


L 


L 


8 
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P 
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P 
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V 


10 


Y 
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A 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


N 


R 


E 


E 


E 


E 


E 


R 


11 


T 


T 
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R 


T 
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T 
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Y 
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G 


G 


G 


13 


P 


P 


P 


P 


P 


P 


P 


P 


P 


P 


P 


R 


P 


L 


L 


R 


P 


P 


P 


14 


C 


T 


A 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


15 


K 


K 


K 


K 


K 


V 


G 


A 


L 


I 


K 


Y 


K 


K 


K 


R 


K 


K 


K 


16 


A 


A 


A 


A 


A 


A 


A 


A 


A 


A 


A 


Q 


R 


A 


A 


G 


G 


A 


K 


17 


R 


R 


R 


A 


A 


R 


R 


R 


R 


R 


R 


K 


K 


Y 


R 


H 


R 


S 


K 


18 
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I 


I 


L 
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I 


I 


I 


I 


I 


I 


I 


I 


I 


I 


L 


I 


F 
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I 
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I 
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I 


I 


I 


I 


I 


I 


I 


P 


P 


R 


R 


R 


P 


R 


P 
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24 
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N 
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c 
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31 
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37 
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G 


G 
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C 


C 


C 


C 
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C 


C 


C 


C 


C 


C 


C 
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R 
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R 


R 


R 


R 


R 


R 


R 


G 


G 


G 


G 


G 


K 


R 


G 
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A 
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G 
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G 
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Table 13, continued. 



K f 


-L 


o 


•> 

■3 


A 




o 


7 


Q 
O 


q 


10 

JL V 


11 

X J- 


12 


13 


14 


15 


16 


17 


18 


19 


A A. 


ii 




ri 

Xi 


Fi 


W 
r» 




W 
x« 


N 


N 

XI 


A* 


N 


R 


R 


R 


R 


N 


N 


R 


R 
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Q 


























R 






61 









































62 - - - D 

63 - - - K 

64 - - - S 



R # = residue number 

1 BPTI 

2 Engineered BPTI From MARK87 

3 Engineered BPTI From MARK87 

4 Bovine Colostrum (DUFT85) 

5 Bovine Serum (DUFT85) 

6 Semisynthetic BPTI, TSCH87 

7 Semisynthetic BPTI, TSCH87 

8 Semisynthetic BPTI, TSCH87 

9 Semisynthetic BPTI, TSCH87 

10 Semisynthetic BPTI, TSCH87 

11 Engineered BPTI, AUER87 

12 Dendroaspis polvlepis polvleois (Black mamba) venom I 
(DUFT85) 

13 Dendroaspis polvlepis polylepis (Black Mamba) venom K 
(DUFT85) 

14 Hemachatus hemachates (Ringhals Cobra) HHV II 
(DUFT85) 

15 Naia nivea (Cape cobra) NNV II (DUFT85) 

16 Vipera russelli (Russel's viper) RW II (TAKA74) 

17 Red sea turtle egg white (DUFT85) 

18 Snail mucus ( Helix pomania ) (WAGN78) 

19 Dendroaspis ancrusticeps (Eastern green mamba) 
C13 SI C3 toxin (DUFT85) 
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Table 13, continued. 



R # 
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D 


D 


K 


N 


N 


N 


25 


Q 


K 


W 


S 


P 


S 


S 


G 


A 


T 


P 


A 


T 


Q 


26 


K 


G 


A 


A 


A 


H 


S 


T 


V 


R 


S 


K 


R 


E 


27 


K 


A 


A 


s 


S 


L 


s 


S 


K 


L 


A 


A 


T 


T 


28 


K 


N 


K 


N 


N 


H 


K 


M 


G 


K 


K 


G 


K 


K 


29 


Q 


K 


K 


K 


K 


K 


R 


A 


K 


T 


R 


F 


Q 


N 


30 


c 


c 


c 


c 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


31 


E 


Y 


Q 


N 


E 


Q 


E 


E 


V 


K 


V 


E 


E 


E 


32 


R 


P 


L 


K 


K 


K 


K 


T 


L 


A 


Q 


T 


P 


E 


33 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


34 


D 


T 


H 


I 


I 


N 


I 


Q 


P 


Q 


R 


V 


K 


I 


35 


W 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


Y 


36 


S 


S 


G 


G 


G 


G 


G 


G 


G 


R 


G 


G 


G 


G 


37 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


38 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


39 


G 


R 


K 


P 


R 


G 


G 


M 


Q 


D 


D 


K 


K 


Q 


40 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


G 


A 


G 


G 


41 


N 


N 


N 


N 


N 


N 


N 


N 


N 


D 


D 


K 


N 


N 


42 


S 


A 


A 


A 


A 


A 


A 


G 


G 


H 


H 


S 


G 


D 


43 


N 


N 


N 


N 


N 


N 


N 


N 


N 


G 


G 


N 


N 


N 
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Table 13, continued. 



R # 


20 


21 


22 


23 


24 


25 


26 


27 


28 


29 


30 


31 


32 


33 


44 


R 


R 


R 


N 


N 


N 


N 


N 


K 


N 


N 


N 


R 


R 


45 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


F 


Y 


F 


46 - 


K 


K 


S 


K 


K 


K 


H 


V 


Y 


K 


K 


R 


K 


S 


47 


T 


T 


T 


T 


T 


T 


T 


T 


S 


T 


S 


S 


S 


T 


48 


I 


. I 


I 


W 


W 


I 


L 


E 


E 


E 


D 


A 


E 


L 


49 


E 


E 


E 


D 


D 


D 


E 


K 


K 


T 


H 


E 


Q 


A 


50 


E 


E 


K 


E 


E 


E 


E 


E 


E 


L 


L 


D 


D 


£ 


51 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


52 


R 


R 


R 


R 


R 


Q 


E 


L 


R 


R 


R 


H 


L 


E 


53 


R 


R 


H 


Q 


H 


R 


K 


Q 


E 


C 


C 


R 


D 


Q 


54 


T 


T 


A 


T 


T 


T 


V 


T 


Y 


E 


E 


T 


A 


X 


55 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


C 


c 


56 


I 


V 


V 


G 


V 


A 


G 


R 


G 


L 


E 


G 


S 


i 


57 


G 


V 


G 


A 


A 


A 


V 




V 


V 


L 


G 


G 


N 


58 








S 


S 


K 


R 




P 


Y 


Y 


A 


F 




59 








A 


G 


Y 


S 




G 


P 


R 








60 










I 


G 






D 













20 Dendroaspis anausticeps (Eastern Green 
Mamba) C13 S2 C3 toxin (DUFT85) 

21 Dendroaspis polylepis polvlepes (Black 
mamba) B toxin (DUFT85) 

22 Dendroaspis polvlepis polvlepes (Black 
Mamba) E toxin (DUFT85) 

23 Vioera ammodvtes TI toxin (DUFT85) 

24 Vipera ammodvtes CTI toxin (DUFT85) 

25 Bunaarus fasciatus VIII B toxin (DUFT85) 

26 Anemonia sulcata (sea anemone) 5 II 
(DUFT85) 

27 Homo sapiens HI -14 "inactive" domain 
(DUFT85) 

28 Homo sapiens HI-14 "active" domain 
(DUFT85) 

29 beta bungarotoxin Bl (DUFT85) 

30 beta bungarotoxin B2 (DUFT85) 

31 Bovine spleen TI II (FI0R85) 

32 Tachvoleus tridentatus (Horseshoe crab) 
hemocyte inhibitor (NAKA87) 

33 B Qflfry x fflp r j, (silkworm) SCI-III (SASA84) 

Notes : 

a) both beta bungarotoxins have residue 15 deleted. 

b) B. mori has an extra residue between C5 and C14; we 
have assigned F and G to residue 9. 

c) all natural proteins have C at 5, 14, 30, 38, 50, & 55. 

d) all homologues have F33 and G37. 

e) extra C's in bungarotoxins form interchain cystine 
bridges 
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Table 14: Tally of lonizable Groups. 
BPTI homologues. 



Sequence 



Identifier 


D 


E 


K 


1 


2 


2 


4 


2 


2 


2 


4 


3 


2 


2 


4 


4 


2 


4 


2 


5 


2 


4 


4 




2 


2 


3 


7 


2 


2 


3 


8 


2 


2 


3 


9 


2 


2 


3 


10 


2 


2 


3 


11 


2 


3 


4 


12 


0 


3 


7 


13 


1 


2 


8 


14 


2 


3 


2 


15 


1 


4 


2 


16 


2 


5 


3 


17 


2 


4 


6 


18 


1 


1 


2 


19 


0 


2 


9 


20 


2 


3 


6 


21 


0 


3 


3 


22 


0 


2 


6 


23 


4 


1 


5 


24 


3 


2 


4 


25 


1 


2 


5 


26 


1 


5 


4 


27 


1 


4 


2 


28 


2 


3 


4 


29 


6 


2 


5 


30 


6 


2 


6 


31 


2 


3 


5 


32 


3 


3 


5 


33 


4 


7 


3 



R 


Y 


H 


NH 


C02 


+ 


# 


6 


4 


0 


1 


1 


6 


16 


6 


4 


0 


1 


1 


6 


16 


6 


4 


0 


1 


1 


6 


16 


3 


3 


0 


1 


1 


-1 


13 


4 


4 


0 


1 


1 


2 


16 


6 


4 


0 


1 


1 


5 


15 


6 


4 


0 


1 


1 


5 


15 


6 


4 


0 


1 


1 


5 


15 


6 


4 


0 


1 


1 


5 


15 


6 


4 


0 


1 


1 


5 


15 


6 


4 


0 


1 


1 


5 


19 


7 


3 


1 


1 


1 


11 


19 


5 


4 


0 


1 


1 


10 


18 


5 


3 


1 


1 


1 


2 


14 


7 


2 


2 


1 


a 


4 


16 


7 


3 


2 


1 


i 


3 


19 


7 


3 


0 


1 


i 


7 


21 


4 


4 


0 


1 


i 


4 


8 


4 


4 


0 


1 


i 


11 


17 


7 


3 


1 


1 


i 


8 


20 


5 


5 


0 


1 


i 


5 


13 


3 


3 


2 


1 


i 


7 


13 


3 


4 


2 


1 


i 


3 


15 


6 


5 


1 


1 


i 


5 


17 


3 


3 


1 


1 


i 


5 


13 


4 


4 


1 


1 


i 


2 


16 


2 


4 


0 


1 


i 


-1 


11 


3 


3 


0 


1 


i 


2 


14 


7 


4 


2 


1 


i 


4 


22 


7 


4 


2 


1 


i 


5 


23 


4 


4 


0 


1 


i 


4 


16 


5 


4 


0 


1 


i 


4 


18 


1 


4 


0 


1 


i 


-7 


17 



Sequences given in Table 10. 

+ is sum ofK + R + NH-D-E - C02, approximate charge on 
molecule at pH 7.0 

# is sum ofK + R + NH + D+ E + C02, i.e. number of ionized 
groups at pH 7.0. 
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Table 15: Amino acids observed at each Residue 
BPTI homologues 





Number 












Different 










Res. # 


AAs 


Contents 




BPTI 


-5 


2 


D -32 








2 


E -32 




_ 


—3 


5 


T P 


F 2 -29 






Cm 


10 


Z3 R3 Q2 T2 H G L K E -18 




X 


10 


D4 T2 P2 Q2 E G N K R -18 




X 


10 

XV/ 


R21 


A2 K2 H2 P L I 


T G D 




2 

a* 


g 


P20 


R4 A2 H2 N E V 


F L 


P 

XT 


-> 


10 

mm, W 


D15 
i> -i. «j 


K6 T3 R2 P2 S Y G A L 


n 


*r 


7 


F19 

X X7 


D4 L3 Y2 12 A2 


s 




e 


1 

X 








r 




10 


Lll 

•UXX 


E5 N4 K3 Q2 12 


Y2 D2 T R 


±j 


7 


R 


L18 


Ell K2 S Q 




E 


8 


7 


P26 


H2 A2 I L G F 




P 


9 


g 


P17 


A6 V3 R2 Q L K 


Y F 


P 


10 


10 


Yll 


E7 D4 A2 N2 R2 


V2 SID 


Y 


11 


10 


T17 


P5 A3 R2 I S Q 


Y V K 


T 


12 


2 


G32 


K 




G 


13 


5 


P22 


R6 L3 N I 




p 


14 


3 


C31 


T A 




c 


15 


12 


K15 


R4 Y2 M2 L2 -2 


V G A I N F 


K 


16 


7 


A22 


G5 Q2 R K D F 




A 


17 


12 


R12 


K5 A2 Y3 H2 S2 


F2 L M T G P 


R 


18 


6 


121 


M4 F3 L2 V2 T 




I 


19 


7 


111 


P10 R6 S2 K2 L 


Q 


I 


20 


5 


R19 


A7 S4 L2 Q 




R 


21 


4 


Y18 


F13 W I 




Y 


22 


6 


P14 


Y14 H2 A N S 




F 


23 


2 


Y32 


F 




Y 


24 


4 


N26 


K3 D3 S 




N 


25 


10 


A12 


S5 Q3 P3 W3 L2 


T2 K G R 


A 


26 


9 


K16 


A6 T2 E2 S2 R2 


G H V 


K 


27 


5 


A18 


S8 K3 L2 T2 






28 


7 


G13 


K10 N5 Q2 R H M 


G 


29 


10 


L9 Q7 K7 A2 F2 R2 M G T N 


L 


30 


1 


C33 






C 


31 


7 


Q12 


Ell L4 K2 V2 Y 


N 


Q 


32 


11 


T12 


P5 K4 Q3 E2 L2 


G V S R A 


T 


33 


1 


P33 






F 


34 


11 


Vll 


18 T3 D2 N2 Q2 


F H P R K 


V 


35 


2 


Y31 


W2 




Y 


36 


3 


G27 


S5 R 




G 


37 


1 


G33 






G 


38 


3 


C31 


T A 




C 


39 


7 


R13 


G9 K4 Q3 D2 P M 


R 
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Table 15: continued. 

Number 
Different 





AAs 


Contents 


A n 




G22 


All 


4 X 




N20 

Haw 


Kll D2 


AO 


Q 


All 

All 


R9 S4 G3 H2 D Q K N 


A** 




li Ji 




A A 

44 




X« a 1 


Rll K 




p 


F32 


y 


Afi 


8 


K24 


E2 S2 D H V Y R 


A7 

*» / 


2 


T19 


S14 


Aft 
*» o 


g 


All 


19 E4 T2 W2 L2 R K D 




7 


E19 


D6 A2 Q2 K2 T H 




6 


E16 


D12 L2 M Q K 


51 


1 


C33 




52 


7 


R13 


M10 L3 E3 Q2 H V 


53 


8 


R21 


Q3 E2 H2 C2 G K D 


54 


7 


T23 


A3 V2 E2 I Y K 


55 


1 


C33 




56 


8 


G15 


V8 13 E2 R2 A L S 


57 


8 


G19 


V4 A3 P2 -2 R L N 


58 


8 


All 


-10 P3 K3 S2 Y2 R F 


59 


9 


-24 


G2QEAYSPR 


60 


6 


-28 


Q R I G D 


61 


3 


-31 


T P 


62 


2 


-32 


D 


63 


2 


-32 


K 


64 


2 


-32 


S 
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Table 16: Exposure in BPTI 
Coordinates taken from 

Brookhaven Protein Data Bank entry 6PTI. 

HEADER PROTEINASE INHIBITOR (TRYPSIN) 13-MAY-87 6PTI 
COMPND BOVINE PANCREATIC TRYPSIN INHIBITOR 
COMPND 2 (/BPTI$ , CRYSTAL FORM /III$) 
AUTHOR A . WLODAWER 

Solvent radius « 1.40 
Atomic radii given in Table 7 

Areas in Angstroms-squared. 



Not Not 
Total Covered covered 
Residue area by M/C fraction at all fraction 



ARG 


1 


342 .45 


205.09 


0. 5989 


152.49 


0.4453 


PRO 


2 


239.12 


92.65 


0.3875 


47.56 


0.1989 


ASP 


3 


272.39 


158.77 


0.5829 


143.23 


0.5258 


PHE 


4 


311.33 


137.82 


0.4427 


43.21 


0.1388 


CYS 


5 


241.06 


48.36 


0.2006 


0.23 


0.0010 


LEU 


6 


280.98 


151.45 


0.5390 


115.87 


0.4124 


GLU 


7 


291.39 


128.91 


0.4424 


90.39 


0.3102 


PRO 


8 


236.12 


128.71 


0.5451 


99.98 


0.4234 


PRO 


9 


236.09 


109.82 


0.4652 


45.80 


0.1940 


TYR 


10 


330.97 


153.63 


0.4642 


79.49 


0.2402 


THR 


11 


249.20 


80.10 


0.3214 


64.99 


0.2608 


GLY 


12 


184.21 


56.75 


0.3081 


23.05 


0.1252 


PRO 


13 


240.07 


130.25 


0.5426 


75.27 


0.3136 


CYS 


14 


237.10 


75.55 


0.3186 


53.52 


0.2257 


LYS 


15 


310.77 


200.25 


0.6444 


192.00 


0.6178 


ALA 


16 


209.41 


66.63 


0.3182 


45.59 


0.2177 


ARG 


17 


351.09 


243.67 


0.6940 


201.48 


0.5739 


ILE 


18 


277.10 


100.51 


0.3627 


58.95 


0.2127 


ILE 


19 


278.03 


146.06 


0.5254 


96.05 


0.3455 


ARG 


20 


339.11 


144.65 


0.4266 


43.81 


0.1292 


TYR 


21 


333.60 


102.24 


0.3065 


69.67 


0.2089 


PHE 


22 


306.08 


70.64 


0.2308 


23.01 


0.0752 


TYR 


23 


338.66 


77.05 


0.2275 


17.34 


0.0512 


ASN 


24 


264.88 


99.03 


0.3739 


38.69 


0.1461 


ALA 


25 


211.15 


85.13 


0.4032 


48.20 


0.2283 


LYS 


26 


313.29 


216.14 


0.6899 


202.84 


0.6474 


ALA 


27 


210.66 


96.05 


0.4560 


54.78 


0.2601 


GLY 


28 


186.83 


71.52 


0.3828 


32.09 


0.1718 


LEU 


29 


280.70 


132.42 


0.4718 


93.61 


0.3335 


CYS 


30 


238.15 


57.27 


0.2405 


19.33 


0.0812 


GLN 


31 


301.15 


141.80 


0.4709 


82.64 


0.2744 


THR 


32 


251.26 


138.17 


0.5499 


76.47 


0.3043 
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Table 16, continued. 



PHE 


33 


304.27 


59.79 


0. 


1965 


18.91 


0. 


0622 


VAL 


34 


251.56 


109.78 


0. 


4364 


42.36 


0. 


1684 


TYR 


35 


332.64 


80.52 


0. 


2421 


15.05 


0. 


0452 


GLY 


36 


187.06 


11.90 


0. 


0636 


1.97 


0. 


0105 


GLY 


37 


185.28 


84.26 


0. 


4548 


39.17 


0. 


2114 


CYS 


38 


234.56 


73.64 


0. 


3139 


26.40 


0. 


1125 


ARG 


39 


417.13 


304.62 


0. 


7303 


250.73 


0. 


6011 


ALA 


40 


209.53 


94.01 


0. 


4487 


52.95 


0. 


2527 


LYS 


41 


314.60 


166.23 


0. 


5284 


108.77 


0. 


3457 


ARG 


42 


349.06 


232.83 


0. 


6670 


179.59 


0.5145 


ASN 


43 


266.47 


38.53 


0. 


1446 


5.32 


0. 


0200 


ASN 


44 


269.65 


91.08 


0. 


3378 


23.39 


0. 


0867 


PHE 


45 


313.22 


69.73 


0. 


2226 


14.79 


0. 


0472 


LYS 


46 


309.83 


217.18 


0. 


7010 


155.73 


0. 


5026 


SER 


47 


224.78 


69.11 


0. 


3075 


24.80 


0. 


1103 


ALA 


48 


211.01 


82.06 


0.3889 


31.07 


0. 


1473 


GLU 


49 


286.62 


161.00 


0. 


5617 


100.01 


0. 


3489 


ASP 


50 


299.53 


156.42 


0. 


5222 


95.96 


0. 


3204 


CYS 


51 


238.68 


24.51 


0. 


1027 


0.00 


0. 


0000 


MET 


52 


293.05 


89.48 


0. 


3054 


66.70 


0. 


2276 


ARG 


53 


356.20 


224.61 


0. 


6306 


189.75 


0.5327 


THR 


54 


251.53 


116.43 


0. 


4629 


51 . 64 


0. 


2053 


CYS 


55 


240.40 


69.95 


0. 


2910 


0.00 


0. 


0000 


GLY 


56 


184.66 


60.79 


0. 


3292 


32.78 


0. 


1775 


GLY 


57 


106.58 


49.71 


0. 


4664 


38.28 


0. 


3592 



ALA 58 no position given in Protein Data Bank 



"Total area" 



"Not covered 
by M/C" 



"Not covered 
at all" 



is the area measured by a rolling sphere 
of radius 1.4 A f where only the atoms 
within the residue are considered. This 
takes account of conformation. 

is the area measured by a rolling sphere 
of radius 1.4 A where all main-chain atoms 
are considered, fraction is the exposed 
area divided by the total area. Surface 
buried by main-chain atoms is more 
definitely covered than is surface covered 
by side group atoms. 

is the area measured by a rolling sphere 
of radius 1.4 A where all atoms of the 
protein are considered. 
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Table 17: Plasmids used in Detailed Example 
Phage Contents 

LG1 M13mpl8 with Ma II/Aat II/AcS I/Ese 

II/Sau I adaptor 
pLG2 LG1 with amp R and ColEl of pBR322 cloned 

into Aat II/Acc I sites 
pLG3 pLG2 with Acc I site removed 

pLG4 pLG3 with first part of osp-pbd gene 

cloned into Rsr II/Sau 1 sites, 

Avr II/ Asu II sites created 
pLG5 pLG4 with second part of osp-pbd gene 

cloned into Avr II/ Asu II sites, BssH I 

site created 

pLGS pLG5 with third part of osp-pbd gene 

cloned into Asu II/BssH I sites , Bbe I 
site created 

pLG7 pLG6 with last part of osp-pbd gene 

cloned into Bbe I / Asu II sites 

pLGS pLG7 with disabled osp-pbd gene, same 

length DNA. 

pLG9 pLG7 mutated to display BPTI(V15 B p T j) 

pLGiO pLG8 + tet R gene - amp R gene 

pLGll pLG9 + tet R gene - amp R gene 



WO 90/02809 PCT/US89/03731 

223 

Table 25: Annotated Sequence of ipbd gene 
5 '- C | GGA| CCG | TAT | CCA | GGC | TTT | ACA | CTT | TAT | 

28 

| Rsr'll 1 | -35 I. 



|GCT|TCC|GGC|TCG|TAT|AAT|GTG|TGG| 52 



I AAT | TGT | GAG | CGG | ATA | ACA | ATT | 73 
I lac operator |_ 



|CCT|AGG|AGG|CTC|ACT| 88 
| Avr II | 

| S. D. | 



|m|k|k|s|l|v|l|k1afs| 
I 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 
| ATG | AAG | AAA | TCT | CTG | GTT | CTT | AAG | GCT | AGC | 118 

| Afl II | Nhe I | 



| v | a | v | a | t | 1 | v | p | m [ 1 | 
| 11| 12| 13| 14 | 15| 16| 17 | 18| 19 | 20| 
j GTT | GCT | GTC | GCG | ACC | CTG | GTA | CCG | ATG | CTG | 148 
1 Nru I | | Kpn I | 

|s]f|a|r|pfd|f|c|l|e| 
| 21 1 22 | 23 | 24 | 25 1 26 1 27 | 28 j 29 j 30| 
| TCT | TTT | GCT | CGT | CCG | GAT | TTC | TGT | CTC | GAG | 178 
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Table 25, continued. 
UscI£Xj | Ava I | 

1 Xho I | 



|p|p|y|t|g|p|c|k|a|r| 
| 31| 32 | 33 1 34 j 35 ( 36 1 37 | 38 1 39 1 40 1 
| CCG | CCA | TAT | ACT | GGG | CCC | TGC | AAA | GCG | CGC | 208 
I Pf 1M I L 1BSSH II) 

I Apa I | 

| Dra II | 

I Pss I I 



|i|i|r|y|f|y|n[a|k| 
| 41 | 42 | 43) 44 | 45 | 46 | 47 | 48 | 49 | 
| ATC | ATC | CGT | TAT[TTC | TAC| AAC | GCT[ AAA! 
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Table 25, continued. 
|a|g|l|c|q|t|f|v|y|g|g| 

| 50 | 51| 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 
| GCA | GGC | CTG | TGC | CAG | ACC | TTT | GTA | TAC | GGT | GGT | 268 
| Stu 1 1 | Acc I | 

| Xca I | 



| c | r | a|k| r |n| b| f | k| 
| 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 

|TGC|CGT|GCT|AAG|CGT|AAC|AAC|TTT|AAA| 295 
I Esp I I 



| s | a | e | d | c | i | r | t | c | g | 
| 70| 71| 72 1 73 1 74|"75| 76 1 77 1 78] 79 j 
| TCG | GCC | GAA | GAT | TGC | ATG | CGT | ACC | TGC | GGT | 325 
| Xmalll | 1 Sph I | 

|g|a|a|e|g|d|d| 
| 80 1 81 1 82 | 83 | 84 | 85 1 86 J 

| GGC | GCC | GCT | GAA | GGT | GAT | GAT | 346 
| Bbe I [ 
| Nar I | 



|p|a|k|a|a| 
| 87 | 88 | 89 | 90 | 9l| 

|CCG|GCC|AAA|GCG|GCC| 361 
| Sfi I L 
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|f|n|s|l|g|a|s|a|t| 

Table 25, continued. 
| 92 | 93 | 94 | 95 | 96 | 97 | 98 [ 99|l00| 
|TTT|AAC|TCT|CTG|CAA|GCT|TCT|GCT|ACC| 388 

|Hind 3 | 



|e|y|i|g|y|a|w| 
| 101 | 102 | 103 | 104 | 105 | 106 | 107 | 

| GAA | TAT | ATC | GGT | TAC | GCG | TGG | 409 

| Mlu I | 



| a | m | v | v | v | 
1 108 1 109 1 110 | 111 1 112 | 

|GCC|ATG|GTG|GTG|GTT| 424 

| BstX I J_ 

{ Nco I | 
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Table 25, continued. 

|i|v|g|a|t|i|g|i| 
1 113 1 114 1 115 1 116 J 117 1 118 1 119 1 120 1 

| ATC | GTT | GGT | GCT | ACC | ATC | GGT | ATC | 448 



|k|l|f|k|k|f|t|s|k|a| 
| 121 | 122 | 123 | 124 | 125 | 126 | 127 | 128 | 129 | 130 | 
| AAA | CTG | TTT | AAG | AAA | TTT | ACT | TCG | AAA | GCG | 478 

lAsu II| 



i S | . | . | . | 
1 131 1 132 1 133 1 134 | 

|TCT|TAA|TAG|TGAlGGT|TAC|CAG|TCT| 502 
| BstE II| 



| AAG | CCC | GCC | TAA | TGA | GCG | GGC | TTT | TTT | TTT | 532 
| Trp terminator , L 



. |CCT|GAG|G -3' 539 
I Sau I I 



Note the following enzyme equivalences, 

Xma III = lag I 

Acc III = BspM II 

Dra II = ECOO109 I 

Asu II = BstB I 

Sau I = BSU36 I 
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Table 27: DNA_synthl 



5' | CCG | TCC | GTC | GGA | CCG | TAT | CCA | GGC | TTT | ACA | CTT | TAT | 



| GCT | TCC | GGC | TCG | TAT | AAT | GTG | TGG | 



| AAT | TGT | GAG | CGG \ ATA | ACA | ATT | 
olig#4 = 3'- gt taa 



|CCT|AGG| 
gga tec 



/ 3' = olig#3 
| GCC | GCT I CCT 1 TCG | A AA | GCG | 
egg cga gga age ttt cgc 



| TCT | TAA I TAG | TGA | GGT | TAC | CAG | TCT | 
aga att ate act cca atg gtc aga 

| AAG | CCC | GCC | TAA | TGA | GCG | GGC | TTT | TTT | TTT | 
ttc ggg egg att act cgc ccg aaa aaa aaa 



e 



| CCT | GAG | GCA| GGT | GAG | CG 
gga etc cgt cca etc gc - 5' 
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"Top" strand 99 

"Bottom" strand 100 

Overlap 23 

Net length 158 
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Table 27, continued. 



(14 c/g and 9 a/t) 
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Table 28: DNA_seq2 



5'- |gca|cca|acg| 
| spacer | 

» 



| CCT | AGG | AGG | CTC | ACT | 
I Avr II| 

1 S. D. | 



|m|k|k|s|l|v|l|k|a|s| 
|1|2|3|4|5|6|7|8|9|10| 
| ATG | AAG | AAA | TCT | CTG | GTT | CTT | AAG | GCT | AGC | 

[ Afl II | Nhe T | 



I v I a | v | a | t | 1 | v | p | n | l | 
| 11| 12| 13| 14 | 15 | 16 | 17| 18 | 19 j 20| 
| GTT | GCT | GTC | GCG | ACC | CTG | GTA | CCG | ATG | CTG | 
1 Nru U | Ron l| 



|s|f|a|r|p|d|f|c 

| 21| 22 1 23 j 24 1 25 1 26 1 27 1 28 
| TCT [ TTT | GCT | CGT | CCG | GAT | TTC | TGT 
jAccIIl| 



|P|p|y|t|g|p|c|k 
| 31| 32 | 33 | 34 | 35| 36| 37 | 38 
| CCG | CCA | TAT | ACT | GGG | CCC | TGC | AAA 
I PflM I | 
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Table 28, continued. 



I Apa I 1 
1 Dra II | 
| Pss I | 



I i I i I r | 
| 41| 42 | 43 | 
| ate | ate | cgt | 



r 1 1 s i k i 

| 127 | 128 | 129 | 

| ACT | TCG | AAa | gcg | get | gcg | - 3' 
[Asu II | spacer 1_ 
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Table 30: DNA_seq3 



I a | r | 
| 39 1 40 | 
5'- |ccc|tgc|aca|GCG|CGC| 
| spacer |BssH II | 



|i|i[r|y|f| Y |n|a|lc| 
| 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 [ 49 | 
|ATC|ATC|CGT|TAT|TTC|TAC|AAC|GCT|AAA| 



|a|g|l|c|q|t|f|v|y|g|g| 
| 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 
| GCA | GGC | CTG | TGC | CAG | ACC | TTT | GTA | TA£ | GGT | GGT | 
| Stu I | | ACC I | 



|o|r|a|k|r|n|n|f|k| 
| 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 ] 69 { 
| TGC | CGT | GCT | AAG | CGT | AAC | AAC | TTT J AAA | 
1 Bap I 1 



|s|a|e|d|c|m|r|t|c|g| 
I 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 
| TCG | GCC | GAA | GAT | TGC | ATG | CGT | ACC | TGC | GGT | 
| Xmalll | | SPh I | 



| Xca I | 



| 80 j 8l| 



WO 90/02809 



233 



PCT/US89/03731 



Table 30, continued. 



|GGC|GCC|gct|gaa| 
| Rbe I | spacer 
1 t?ar I 1 



| t | s | k | 
| 127 | 128 | 129 | 
| ttt | acT | TCG | AAa | gcg | teg | ccg | - 3' 
lAsu II| 
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Table 32: DNA_seq4 



|g|a|a|e|g|d|d| 
5' | 80| 81| 82 [ 83 | 84 | 85 | 86 | 

| cct | cgc J cct | GGC | GCC | GCT | GAA| GGT | GAT | GAT | 
| spacer | Bbe I ] 

1 Nar I | 



| p | a | k | a | a | 
| 87 | 88 | 89 | 90 | 9l| 
| CCG 1 GCC | AAA | GCG | GCC | 
| Sfi I I 

| f I n | s. | 1 | q | a | s | a | t | 
| 92 | 93 | 94 | 95| 96 j 97 | 98 | 99|l00| 
| TTT | AAC | TCT | CTG| CAA[GCT | TCT | GCT | ACC | 

| Hind 3 | 



|e[y|i|g|y|a|w| 

1 101 1 102 1 103 1 104 1 105 1 106 1 107 | 
| GAA | TAT | ATC | GGT | TAC | GCG | TGG | 

1 Mlu I| 



I a | m | v | v | v | 
| 108 | 109 | 110 | 111 | 112 | 
| GCC | ATG | GTG | GTG | GTT | 

| BstX I |_ 

| NCO I | 
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Table 32, continued. 
|i|v|g|a|t|i|g|i| 

| 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 
| ATC | GTT | GGT | GCT | ACC | ATC | GGT | ATC | 

|k|l|f|k|k|f|t|s|k| 

| 121 | 122 | 123 | 124 | 125 | 126 | 127 | 128 | 129 | 

| AAA | CTG | TTT | AAG | AAA | TTT | ACT | TCG | AAa | gcg | teg | ggc | - 3' 

|Asu II | spacer |_ 
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Table 34: Some interaction sets in BPTI 
Number 



Res . Dif f . . ■ 





AAs 


Contents 


BPTI 


1 


2 


3 


4 


5 


-5 


2 


D -32 














-4 


2 


E -32 














-3 


5 


T P 


F Z -29 














-2 


10 


Z3 R3 Q2 T2 H G L K E -18 


_ 












-1 


10 


D4 T2 P2 Q2 E G N K R -18 














1 


10 


R21 


A2 K2 H2 P L I T G D 


R 










5 


2 


9 


P20 


R4A2H2NEVFL 


P 








s 


5 






D15 


K6 T3 R2 P2 S Y G A L 


D 








4 


s 


4 


7 


F19 


D4 L3 Y2 12 A2 S 


F 








s 


5 


5 


1 


C33 




C 








X 


X 


6 


10 


Lll 


E5 N4 K3 Q2 12 Y2 D2 T R 


L 








4 




7 


5 


L18 


Ell K2 S Q 


E 






s 


4 




8 


7 


P26 


H2 A2 I L G F 


P 






3 


4 




9 


9 


P17 


A6 V3 R2 Q L K Y F 


P 




s 


3 


4 




10 


10 


Yll 


E7 D4 A2 N2 R2 V2 S I D 


Y 


s 




s 


4 




11 


10 


T17 


P5 A3 R2 I S Q Y V K 


T 


1 


s 


3 


4 




12 


2 


G32 


K 


G 


X 




X 


X 





9 
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Table 34, continued. 



13 


5 


P22 


R6 L3 N I 


P 


1 


S 4 S 


14 


3 


C31 


T A 


C 


1 


s s 5 


JL -J 


12 


K15 


R4 Y2 M2 L2 -2 V G A I N F 


K 


1 s 


3 4s 


16 


7 


A22 


65 Q2 R K D F 


A 


1 s 


s s 5 




12 


R12 


K5 A2 Y3 H2 S2 F2 L M T G P 


R 


1 2 


3 s 


xo 




121 


M4 F3 L2 V2 T 


I 


1 s 


s 5 


1 Q 
X:* 


7 


111 


P10 R6 S2 K2 L Q 


I 


1 2 


3 s 


z u 


5 


R19 


A7 S4 L2 Q 


R 


s s 


s 5 




4 


Y18 


F13 W I 


Y 


2 


s s s 


9 9 
z z 




F14 


Y14 H2 A N S 


F 


s 


3 4 


9"* 

Z <J 


2 


Y32 


F 


Y 




s s 


94 


4 


M26 


K3 D3 S 


N 


s 


3 


9 R 

Z -J 


10 


A12 


S5 Q3 P3 W3 L2 T2 K G R 


A 


s 


s 


96 


9 


K16 


A6 T2 E2 S2 R2 G H V 


K 


s 


3 4 


97 
z / 


5 


A18 


S8 K3 L2 T2 


A 


2 


3 4 


28 


7 


G13 


K10 N5 Q2 R H M 


G 


2 


S S 


29 


10 


L9 Q7 K7 A2 F2 R2 M G T N 


L 


2 


3 


30 


1 


C33 




C 


X 


X X 


31 


7 


Q12 


Ell L4 K2 V2 Y N 


Q 


2 


3 4 


^ 9 
J Z 


1 1 

XX 


T12 


P5 K4 Q3 E2 L2 G V S R A 


T 


2 


3 s 


33 


1 


F33 




F 


X X 


X X 


34 


11 


VI 1 


18 T3 D2 N2 Q2 F H P R K 


V 


1 2 


3 S 


35 


2 


Y31 


W2 


Y 


s s 


s 5 


36 


3 


G27 


S5 R 


G 


1 




37 


1 


G33 




G 


X 


X 


38 


3 


C31 


T A 


C 


1 


s 5 


39 


7 


R13 


G9 K4 Q3 D2 P M 


R 


1 


4 s 
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Table 34, continued. 



40 


2 


622 


All 


A 


s 


s 


5 


41 


3 


N20 


Kll D2 


K 




4 


s 


42 


9 


All 


R9 S4 G3 H2 D Q K N 


R 




s 


5 


43 . 


2 


N31 


G2 


N 






s 


44 


3 


N21 


Rll K 


N 






s 


45 


2 


F32 


Y 


F 






s 


46 


8 


K24 


E2 S2 D H V Y R 


K 






5 


47 


2 


T19 


S14 


S 


s 




5 


48 


9 


All 


19 E4 T2 W2 L2 R K D 


A 


2 


s 


s 


49 


7 


E19 


D6 A2 Q2 K2 T H 


E 


2 




s 


50 


6 


E16 


D12 L2 M Q K 


D 


s 




5 


51 


1 


C33 




C 


X 




X 


52 


7 


R13 


M10 L3 E3 Q2 H V 


M 


2 




s 


53 


8 


R21 


Q3 E2 H2 C2 G K D 


R 


s 




5 


54 


7 


T23 


A3 V2 E2 I Y K 


T 






5 


55 


1 


C33 




C 






X 


56 


8 


G15 


V8 13 E2 R2 A L S 


G 








57 


8 


G19 


V4 A3 P2 -2 R L N 


G 








58 


8 


All 


-10 P3 K3 S2 Y2 R F 


A 








59 


9 


-24 


G2QEAYSPR 










60 


6 


-28 


Q R I G D 










61 


3 


-31 


T P 










62 


2 


-32 


D 










63 


2 


-32 


K 










64 


2 


-32 


S 











s indicates secondary set 

x indicates in or close to surface but buried and/or highly 
conserved. 
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Table 35: 
Distances from C beta to 
Tip of Side Group 
in Angstroms 



Amino Acid type Distance 

A 0.0 

C (reduced) I- 8 

D 2.4 

E 3.5 

F 4.3 

G 

H 4.0 

I 2.5 

K 5.1 

L 2.6 

M 3.8 

N 2.4 

p 2.4 

Q 3 - 5 

R 6.0 

S 1.5 

T 1.5 

V 1.5 
W 5.3 

Y 5.7 



Notes: These distances were calculated for standard model parts 
with all side groups fully extended. 
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Table 36: Distances, BPTI residue set #2 
Distances in Angstroms between Cb e t a s. 
Hypothetical Cj3 eta was added to each Glycine. 

R17 119 Y21 A27 G28 L29 Q31 T32 V34 A48 
119 7.7 

Y21 15.1 8.4 

A27 22.6 17.1 12.2 

G28 26.6 20.4 13.8 5.3 

L29 22.5 15.8 9.6 5.1 5.2 

Q31 16.1 10.4 6.8 6.8 10.6 6.8 

T32 11.7 5.2 6.1 12.0 15.5 10.9 5.4 
V34 5.6 6.5 11.6 17.6 21.7 18.0 11.4 8.2 

A48 18.5 11.0 5.4 12.6 13.3 8.4 8.8 8.3 15.7 

E49 22.0 14.7 8.9 16.9 16.1 12.2 13.9 13.3 19.8 5.5 

M52 23.6 16.3 8.6 12.2 10.3 7.6 11.3 13.2 20.0 6.2 

P9 14.0 11.3 9.0 12*2 15.4 13.3 7.9 9.2 8.7 13.9 
Til 9.5 11.2 13.5 18.8 22.5 19.8 13.5 12.1 5.7 18.5 
K15 7.9 14.6 20.1 27.4 31.3 27.9 21.4 18.1 10.3 24.6 
A16 5.5 10.1 15.9 25.2 28.5 24.6 18.6 14.5 8.6 19.8 
118 6.1 6.0 11.2 21.3 24.4 20.2 14.7 10.4 7.0 15.0 

R20 10.6 5.9 5.4 16.0 18.5 14.6 9.8 6.9 7.8 10.2 

F22 15.6 10.9 5.6 10.5 12.8 10.3 6.2 8.1 10.8 10.3 

N24 19.9 14.7 9.4 4.1 7.3 6.1 4.8 10.0 14.7 11.4 

K26 24.4 20.1 15.2 5.4 7.7 9.8 10.1 15.3 19.0 17.0 

C30 18.9 12.1 4.6 8.8 9.5 5.3 5.9 8.2 14.9 4.9 

F33 10.8 7.4 7.7 12.6 16.4 13.0 6.6 5.6 5.5 12.2 
Y35 8.4 7.4 9.4 18.4 21.4 17.9 12.2 9.5 5.8 14.4 

S47 17.6 10.6 6.6 17.3 17.9 13.4 12.6 10.4 15.9 5.3 

D50 20.0 13.6 7.2 17.2 16.8 13.5 13.5 12.9 17.6 7.6 

C51 18.9 12.2 4.0 12.1 12.2 8.8 8.8 9.7 15.3 5.4 

R53 25.4 18.6 11.0 17.2 15.0 13.0 15.7 16.7 22.3 9.7 

R39 15.4 16.9 17.1 24.9 27.2 24.9 20.1 18.7 13.8 22.3 
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Table 36, continued. 
Distances in Angstroms between Cb e t a s * 



Hypothetical Cfc eta was added to each Glycine . 





E49 


M52 


P9 


Til 


K15 


A16 


118 


R20 


F22 


N24 


M52 


6.1 




















P9 


17.7 


15.5 


















Til 


22.1 


21.5 


7.2 
















K15 


27.5 


28.7 


16.4 


9.5 














A16 


22.2 


24.2 


14.9 


9.8 


6.2 












118 


17.4 


19.5 


12.2 


9.5 


10.4 


4.9 










R20 


13.0 


13.8 


8.0 


9.4 


14.9 


10.6 


6.2 








F22 


13.8 


11.4 


4.1 


10.6 


19.1 


16.3 


12.7 


6.9 






N24 


15.6 


11.2 


8.4 


15.3 


24.1 


21.9 


18.2 


12.7 


6.6 




K26 


20.9 


15.7 


12.1 


18.6 


27.9 


26.6 


23.3 


18.1 


11.6 


5.9 


C30 


8.7 


5.6- 


10.6 


16.6 


24.1 


20.2 


15.7 


9.8 


6.8 


6.9 


F33 


16.5 


15.4 


4.2 


7.1 


15.0 


12.8 


9.6 


6.1 


5.6 


9.3 


Y35 


17.2 


17.8 


7.8 


5.8 


11.0 


7.6 


4.9 


4.3 


8.8 


14.8 


S47 


4.7 


9.1 


15.3 


18.5 


23.1 


17.6 


12.8 


9.1 


12.0 


15.3 


D50 


5.5 


7.7 


14.7 


18.6 


24.2 


19.2 


14.7 


9.9 


11.0 


14.7 


C51 


7.1 


5.4 


11.0 


16.4 


23.5 


19.2 


14 .6 


8.7 


6.9 


9.6 


R53 


6.3 


5.6 


17.9 


23.1 


29.6 


24.8 


20.3 


15.0 


13.8 


15.5 


R39 


23.9 


24.0 


13.0 


9.5 


12.0 


11.8 


12.5 


12.8 


14.7 


20.8 




K26 


C30 


F33 


Y35 


S47 


D50 


C51 


R53 






C30 


12.4 




















F33 


13.9 


10.1 


















Y35 


19.5 


13.5 


6.4 
















S47 


21.0 


8.8 


13.5 


13.2 














D50 


20.1 


8.6 


14.3 


13.7 


5.0 












C51 


15.0 


3.7 


10.9 


12.5 


6.9 


5.2 










R53 


19.9 


9.9 


18.2 


18.8 


9.4 


5.8 


7.4 








R39 


24.3 


20.6 


14.4 


9.6 


20.4 


19.0 


18.8 


23.4 
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Table 37: vgDNA to vary BPTI set #2.1 

+ 











g 


P 


c 


k 


a 


X 










35 


36 


37 


38 


39 


40 




5'-| 


CAC 


CCT 


GGG 


CCC 


TGC 


AAA 


GCG 


qfk 






spacer 


Aps 


\ I, 










• 

1 


+ 
X 


r 


y 


f 


y 


n 


a 


k 




41 


42 


43 


44 


45 


46 


47 


48 


49 




ATC 


qfk 


CGT 


TAT 


TTC 


TAC 


AAC 


GCT 


AAA 





208 



235 



X 

50 
qfk 


g 

51 
GGt 


X 
52 
gfk 


c 
53 
TGC 


g 

54 
CAG 


t 
55 
ACC 


f 

56 
TTc 


oligi 
78 nl 


128= 3'- acg gtc tgg aag 

:s 



/ 3' = olig#27 72 nts 
+ 

x y g g 

57 58 59 60 
qfk TAC GGT GGT 



268 



Overlap = 12 (7 CG, 5 AT) 
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r 


a 


k 


r 


n 


n 


f 


k 


61 


62 


63 


64 


65 


66 


67 


68 


69 


TGC 


CGT 


GCT 


AAG 


CGT 


AAC 


AAC 


TTT 


AAA 



acg gca cga ttc gca ttg ttg aaa ttt 
1 Psp I I 



+ 



s 


X 


e 


d 


C 


m 


70 


71 


72 


73 


74 


75 


TCT 


qfk 


GAG 


GAT 


TGC 


ATG 



age **m etc eta acg tac gca ccc acc -5' 

| Soh 1 1 spacer | 

k = equal parts of T and G; m = equal parts of C and A; 
q = (.26 T, .18 C, .26 A, and .30 G) ; 
f = (.22 T, .16 C, .40 A, and .22 G) ; 
* = complement of symbol above 

Residue 40 42 50 52 57 71 

Possibilities 21 X 21 x 21 x 21 x 21 x 21 = 8.6 x 10 7 
Abundance x 10: 

of PPBD .768 .271 .459 .671 .600 .459 

Produce ■ 1.77 x 10~ 8 

Parent = 1/(5.5 x 10 7 ) least favored - 1/(4.2 x 10 9 ) 
Least favored one-amino-acid substitution from PPBD present 
at 1 in 1.6 x 10 7 
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Table 38: Result of varying set#2 of BPTI 2.1 



1 


e 


29 


30 


CTC 


GAG 


Ava I 


Xho I 



178 



P 
31 
CCG 



P 
32 
CCA 



y 

33 
TAT 



PflM I 
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p 


C 


k 


a 
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47 


48 


49 




ATC 
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52 
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56 
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c 


r 


a 
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CGT 
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AAC 
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70 


71 


72 


73 


74 
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77 
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GGT 
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Table 39: vgDNA to vary set#2 BPTI 2.2 
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TGC 


mrA 
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208 




1 spacer 


Ape 
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+ 




+ 


+ 
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X 
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41 


42 


43 


44 


45 


46 


47 


48 


49 








CAG 


rvk 


TwT 


TTC 


TAC 


AAC 


6CT 


AAA 




235 



E 
50 
6A6 



+ 
X 
51 



L 

52 
CT6 



c 
53 
T6C 



+ 
X 
54 
qfk 



+ 

55 
Qfk 



f 
56 
TTT 



S 

57 
TC6 



y 

58 
TAC 



g 

59 
G6T 



g 

60 
GGT 



268 



91 nts olig#30 3'- g cca cca 
Overlap =15 (11 CG, 4 AT) 

/- 3' olig#29 94 nts 
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72 


73 


74 
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qfk 


6AT 


TGC 


AT6 



age acc **m eta acg tac gcg acc tgc 

| Sph I | spacer | 



-5' 



k 
m 
w 

q 
f 
* 



equal parts of T and 6; v = equal parts of C, A, and G; 

equal parts of C and A? r = equal parts of A and G; 

equal parts of A and T; 

(.26 T, .18 C, .26 A f and .30 G) ; 

(.22 T, .16 C, .40 A, and .22 G) ; 

complement of symbol above 



Residue 
Possibilities 

Abundance x 
Product = 2.3 x 



38 41 43 44 51 54 55 72 
4x 4X 9x 2x21x21x21x21 

= 6.2 x 10 7 
10 2.5 2.5 .833 5. .663 .397 .437 .602 
10~ 8 



Parent = 1/(4.4 x 10 7 ) least favored = 1/(1.25 x 10 9 ) 
Least favored one-amino-acid substitution from PPBD present 
at 1 in 1.2 x 10 7 
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Table 40: Result of varying set#2 of BPTI 2.2 
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235 
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51 


52 


53 


54 


55 


56 


57 


58 
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60 


GAG 


TTT 


CTG 


TGC 


TCT 


GCT 


TTT 


TCG 


TAC 


GGT 


GGT 
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Table 41: vg DNA set #2 of BPTI 2.3 



















1 


e 




















29 


30 












5'- 
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178 












I SDacer 


Xho I 
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+ 
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31 


32 


33 


34 


35 


36 


37 


38 


39 


40 




CCG 


vma 


TAT 


vma 


GGG 


CCC 


TGC 


GAG 


GCG 


qfk 


208 



V 
41 
GTT 


Q 
42 
CAG 


N 
43 
AAT 


44 

Tdk 


f 

45 
TTC 


y 

46 
TAC 


n 
47 
AAC 


a 
48 
GCC 


k 
49 
AAa 


67 n1 


;s o] 


Lig#34 3'- g atg ttg egg ttc 



-3' olig#33 71 nts 



Overlap =13 (7 CG, 6 AT) 



+ + + + 
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F 


X 


c 


S 


x 


f 


X 


y 
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50 


51 


52 


53 


54 


55 


56 


57 


58 


59 


60 


VAG 


TTT 


nTk 


TGC 


TCT 


qfk 


TTT 


qfk 


TAC 


GGT 


GGT 



btc aaa nam acg aga **m aaa **m atg cca cca 



c 


r 


a 


k 


61 


62 


63 


64 


TGC 


CGT 


GCT 


AAG 



acg gca cga ttc gcg acc ggc 
| Esp I | spacer | 

k = equal parts of T and G; m = equal parts of C and A; 
w = equal parts of A and T; n = equal parts of A,C,G,T; 
d = equal parts A,G,T; v = equal parts A,C,G; 

q = (.26 T, .18 C, .26 A, and .30 G) ; 
f = (.22 T, .16 C, .40 A, and .22 G) ; 
* - complement of symbol above 

Residue 32 34 40 44 50 52 55 57 

Possibilities 6x 6 x 21 x 6x 3x 5x21x21 = 

3 x 10 7 

Abundance x 10 

of PPBD 10/6 10/6 .545 10/6 10/3 30/8 .459 .701 

product =1.01 x 10~ 7 

parent = 1/(1 x 10 7 ) least favored = 1/(4 x 10 8 ) 

Least favored one-amino-acid substitution from PPBD present 

at 1 in 3 x 10 7 
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Table 42: Result of varying set#2 of BPTI 2.3 
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1. A method of obtaining a protein that binds a 
predetermined target that comprises: 

a) preparing a variegated population of replicable 
genetic packages, each package including a nucleic 
acid construct coding on expression for an outer- 
surface-displayed potential binding protein other 
than a single chain antibody comprising (i) a 
structural signal directing the display of the 
protein on the outer surface of the package and 
(ii) a potential binding domain for binding said 
target, where a plurality of different potential 
binding domains are displayed by said population, 

b) causing the expression of said proteins and the 
display of said proteins on the outer surface of 
such packages, 

c) contacting the packages with target material so 
that the potential binding domains of the proteins 
and the target material may interact, and 
separating packages bearing a binding domain that 
binds target material from packages that do not so 
bind, and 

d) recovering and replicating at least one package 
bearing a successful binding domain, 

preferably further comprising (e) determining the 
amino acid sequence of a successful binding 
domain , 
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and more preferably, further comprising (f) 
preparing a new variegated population of 
replicable genetic packages according to step (a) , 
the parental potential binding domain for the 
potential binding domains of said new packages 
being a successful binding domain whose sequence 
was determined in step (e) , and repeating steps 
(b)-(e) with said new population. 

The method of claim 1 wherein the population of 
replicable genetic packages of step (a) is 
obtained by: 

i) preparing a variegated population of DNA 
inserts of each of which comprises a first 
sequence which codes on expression for a potential 
binding domain and, a second sequence encoding 
signal directing that the encoded protein be 
displayed on the outer surface of a chosen 
replicable genetic package, and 

ii) incorporating the resulting population of DNA 
constructs into the chosen replicable genetic 
packages to produce a population of replicable 
genetic packages, 

wherein preferably (1) said population is 
characterized by the display of at least 10 5 but 
not more than 10 9 different potential binding 
domains and/or (2) from 1 in 10 4 to 1 in 10 9 of 
the packages of said population display the same 
potential binding domain. 

The method of claim 1 wherein, in step (a) , the 
potential binding domains encoded by the nucleic 
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acid constructs are each related in sequence to a 

parental potential binding domain by a limited 

number of amino acid substitutions in the amino 

acid sequence of said parental potential binding 

domain, and, preferably the level of variegation 

of the population is chosen such that the packages 

displaying potential binding domains obtained by 5 

single amino acid substitutions in the amino acid t 

sequence of the parental potential binding domain 

are present in detectable amounts, and preferably 

the initially chosen parental potential binding 

protein has at least one stable binding domain and 

said domain has a melting point of at least 60°C 

and is stable over a pH range of at least 3.0-8.0. 

4. The method of claim 1 wherein the displayable 
potential binding protein is a chimeric protein, 
and preferably, wherein said signal is provided 
by a segment of said chimeric protein which is 
essentially identical in amino acid sequence with 
at least a functional portion of a natural outer 
surface protein encoded by said genetic package or 
a cell naturally infected by said genetic package, 
said portion directing the transport of said 
chimeric protein to the outer surface of the 
genetic package. 

5. The method of claim 3 wherein the parental 
potential binding domain is initially chosen to be 
one which is over 50% homologous with a domain of 

a known protein, the latter domain having a < 
melting point of at least about 60°C. 
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6. The method of claim 5 wherein the initially chosen 
parental binding protein does not preferentially 
bind the predetermined target. 

7. The method of claim 3, said target material 
comprising one or more discrete molecules, said 
parental potential binding domain being 
characterized as a sequence of amino acids, 
further comprising identifying an interaction set 
of amino acids which are on the surface of the 
parental potential binding domain and which can 
all simultaneously touch a single molecule of the 
target material, and obtaining potential binding 
domains by substituting a different amino acid for 

• one or more of the amino acids in said interaction 
set. 

8. The method of 'claim 1 wherein the target material 
is a non-macromolecular organic compound and the 
potential binding domains comprise greater than 
about 80 amino acid residues. 

9. The method of claim 1 wherein the target material 
is a non-macromolecular organic compound and the 
potential binding domains comprise greater than 
about 80 amino residues. 

10. The method of claim 1 wherein the target material 
is a mineral insoluble in aqueous solution. 

11. The method of claim 1 wherein the target is an 
inorganic molecule or complex ion that is stable 
in aqueous solution. 
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12. The method of claim 1 wherein the target is an 
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organometallic compound that is stable in aqueous 
solution. 

13. The method of claim 1 wherein the target material 
is a general protease , wherein the immobilized 
target material is first incubated with an 



irreversible or covalent inhibitor to inactivate 



14. The method of claim 1 wherein the replicable 
genetic package is a cell or virus that can be 
affinity separated and retain viability. 

15. The method of claim 5 wherein the known binding 
protein is an enzyme, the activity of which has a 
deleterious effect on the replicable genetic 
package, the host of the replicable genetic 
package, or the target, wherein the majority of 
the nucleic acid constructs code on expression or 
an analogue of the known binding protein that does 
not have such deleterious enzymatic activity. 

16. The method of claim 1 wherein the target contains 
ionizable groups and the pH of the solutions of 
the intended use and the pH of the affinity 
separations are chosen so that both the potential 
binding protein and the target remain stable. 

17. The method of claim 1 wherein the target contains 
ionizable groups, further comprising providing 
counter ions to reduce electrostatic repulsion 
between the potential binding protein and the 
target. 



* 



the protease. 
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18. The method of claim 1 wherein the initial 
potential binding domain is picked so that, under 
the conditions of intended use of the desired 
binding protein and under the conditions of 
affinity separation, that the potential binding 
domains and the target will either have opposite 
charge or one of them will be neutral. 

19. The method of claim 28 wherein the replicable 
genetic package is a bacterial cell, such as 

a strain of Escherichia soli. 

20. The method of claim 1 wherein the replicable 
genetic package is a bacterial spore such as 

a Bacillus endospore, more preferably an endospore 
of a strain of JL. subtilis . 

21. The method of claim 1 wherein the replicable 
genetic package is a bacteriophage, such as a 
filamentous phage, preferably a derivative of an 
M13 Escherichia coli bacteriophage or derivative 
of the Pseudomonas aeruginosa filamentous phage 
Pfl. 

22. The method of claim 21 wherein the signal is 
provided by the coat protein of M13 or a segment 
thereof embodying an outer surface transport 
signal . 

23. The method of claim 21 wherein the signal is 
provided by the gene III protein of M13 or a 
segment thereof embodying an outer surface 
transport signal, 
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24. The method of claim 2 wherein the distribution of 
nucleotides incorporated at each variegated codon 
is chosen to yield substantially equal abundances 
of acidic and basic amino acids r and, preferably 
the distribution of nucleotides incorporated at 
each variegated codon is further chosen to yield 
the largest value for the quantity {(1.- 
abundance (stop codons) ) times (abundance of the 
least abundant amino acid)/ (abundance of the most 
abundant amino acid) } • 

25. The method of claim 1, wherein step (c) further 
comprises contacting the packages with a second 
material and isolating packages which do not bind 
that second material. 

26. The method of claim 1, wherein after obtaining a 
• novel binding protein recognizing a first 

predetermined target, the novel binding protein is 
chosen as a parental potential binding protein for 
the isolation of a derivative protein which also 
binds to a second predetermined target. 

27. The method of claim 3 wherein the initially chosen 
parental potential binding domain is selected from 
the group consisting of (a) binding domains of 
bovine pancreatic trypsin inhibitor, crambin, 
ovomucoid, T4 lysozyme, hen egg white lysozyme, 
ribonuclease, and azurin, and (b) domains at least 
50% homologous with any of the foregoing domains 
and which have a melting point of at least 60°C. 
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28. The method of claim 36 wherein the outer surface 
transport signal is provided by the lamB protein 
or a segment thereof embodying an outer surface 
transport signal. 

29. The method of claim 38 wherein the outer surface 
transport signal is provided by the cotA, cotB, 
cote or cotD protein or a segment thereof 
embodying an outer surface transport signal. 

30. A chimeric protein comprising (i) at least a 
segment of an outer surface protein of a cell or 
virus, said segment providing an outer surface 
transport signal recognized by said cell or virus, 
and (ii) a domain foreign to said outer surface 
protein, and, preferably, said foreign domain 
binds to a target material not preferentially 
bound by said outer surface protein. 

31. A replicable genetic package which contains a 



nucleic acid construct which codes on expression 
for the chimeric protein of claim 30. 

32. The method of claim 1 wherein in at least one 
instance the amino acid residues varied in a first 
assortment of potential binding domains are left 
constant in the next assortment of potential 
binding domains. 

33. A method of preparing a population of variegated 
DNA wherein the distribution of nucleotides 
incorporated at each variegated codon is chosen to 
yield substantially equal abundances of acidic and 
basic amino acids, and, preferably, the 
distribution of nucleotides incorporated at each 
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variegated codon is further chosen to yield the 
largest value for the quantity { (l. -abundance (stop 
codons) } times (abundance of the least abundant 
amino acid) /(abundance of the most abundant amino 
5 acid) } . 

34. The protein of claim 66 , wherein the protein 
comprises a first foreign domain recognizing a 
first target material and a second foreign domain 

10 recognizing a second target material. 

35. The method of claim 3 wherein the initially chosen 
parental potential binding domain is at least 50% 
homologous with the binding domain of bovine 

15 pancreatic trypsin inhibitor. 
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